[ https://issues.apache.org/jira/browse/MAHOUT-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olivier Grisel updated MAHOUT-249: ---------------------------------- Attachment: MAHOUT-249-WikipediaXMLSplitterHDFS.patch Patch attached. Note that by default the old behaviour is preserved (chunks are created on the local FS without CRC checksums). > Make WikipediaXmlSplitter able to write the chunks directly to HDFS or S3 > ------------------------------------------------------------------------- > > Key: MAHOUT-249 > URL: https://issues.apache.org/jira/browse/MAHOUT-249 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.2 > Reporter: Olivier Grisel > Priority: Minor > Fix For: 0.3 > > Attachments: MAHOUT-249-WikipediaXMLSplitterHDFS.patch > > > By using the Hadoop FS abstraction it should be possible to avoid writing the > chunks on the local hard drive before uploading them to HDFS or S3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.