WikipediaXmlSplitter spits one chunk per line
---------------------------------------------
Key: MAHOUT-183
URL: https://issues.apache.org/jira/browse/MAHOUT-183
Project: Mahout
Issue Type: Bug
Components: Classification
Affects Versions: 0.2
Reporter: Olivier Grisel
Fix For: 0.2
The Wikipedia XML splitter inner loops erronously detects end of the line
iterators which cause it to create chunks with just one line worth of page
content instead of respecting the --chunkSize cli option.
Simple patch to fixe this will follow.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.