We already have something pretty close in org.apache.mahout.clustering.conversion.InputDriver. It could be generalized to handle arbitrary delimiters (currently uses space).
-----Original Message----- From: Sean Owen (JIRA) [mailto:[email protected]] Sent: Sunday, August 07, 2011 11:10 AM To: [email protected] Subject: [jira] [Commented] (MAHOUT-781) universal map-reduce job to convert csv file to vectorwritable sequencefile [ https://issues.apache.org/jira/browse/MAHOUT-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080613#comment-13080613 ] Sean Owen commented on MAHOUT-781: ---------------------------------- Xiaobo, the process is that you first explain what you are trying to accomplish, then propose a patch, then it is reviewed, and then it is committed. I think you have skipped the first step. Where does this become useful to Mahout? I can sort of imagine it as a utility class, but not in core. What is test-data.zip? Is there a need to preserve line number in the output? seems like no reducer is needed. > universal map-reduce job to convert csv file to vectorwritable sequencefile > --------------------------------------------------------------------------- > > Key: MAHOUT-781 > URL: https://issues.apache.org/jira/browse/MAHOUT-781 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.6 > Reporter: XiaoboGu > Priority: Minor > Attachments: csv2seq.patch, csv2seq.patch, test-data.zip > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
