WikipediaXmlSplitter spits one chunk per line
---------------------------------------------

                 Key: MAHOUT-183
                 URL: https://issues.apache.org/jira/browse/MAHOUT-183
             Project: Mahout
          Issue Type: Bug
          Components: Classification
    Affects Versions: 0.2
            Reporter: Olivier Grisel
             Fix For: 0.2


The Wikipedia XML splitter inner loops erronously detects end of the line 
iterators which cause it to create chunks with just one line worth of page 
content instead of respecting the --chunkSize cli option.

Simple patch to fixe this will follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to