Bad bug in ChunkedWriter
------------------------
Key: MAHOUT-809
URL: https://issues.apache.org/jira/browse/MAHOUT-809
Project: Mahout
Issue Type: Bug
Components: Clustering
Affects Versions: 0.5, 0.6
Environment: Occurred on single node, not tested on cluster.
Reporter: Florian Bausch
org.apache.mahout.text.ChunkedWriter has a bug, that causes data loss, if the
maximal chunk size is reached. The first chunk is overwritten, then it
continues normally.
This is caused in line 58:
writer = new SequenceFile.Writer(fs, conf, getPath(currentChunkID++),
Text.class, Text.class);
The fix should look like this:
writer = new SequenceFile.Writer(fs, conf, getPath(++currentChunkID),
Text.class, Text.class);
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira