RE: [jira] [Commented] (MAHOUT-781) universal map-reduce job to convert csv file to vectorwritable sequencefile

Jeff Eastman Mon, 08 Aug 2011 08:28:02 -0700

We already have something pretty close in 
org.apache.mahout.clustering.conversion.InputDriver. It could be generalized to 
handle arbitrary delimiters (currently uses space).


-----Original Message-----
From: Sean Owen (JIRA) [mailto:[email protected]] 
Sent: Sunday, August 07, 2011 11:10 AM
To: [email protected]
Subject: [jira] [Commented] (MAHOUT-781) universal map-reduce job to convert 
csv file to vectorwritable sequencefile


    [ 
https://issues.apache.org/jira/browse/MAHOUT-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080613#comment-13080613
 ] 

Sean Owen commented on MAHOUT-781:
----------------------------------

Xiaobo, the process is that you first explain what you are trying to 
accomplish, then propose a patch, then it is reviewed, and then it is 
committed. I think you have skipped the first step.

Where does this become useful to Mahout? I can sort of imagine it as a utility 
class, but not in core.
What is test-data.zip?
Is there a need to preserve line number in the output? seems like no reducer is 
needed.

> universal map-reduce job to convert csv file to vectorwritable sequencefile
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-781
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-781
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.6
>            Reporter: XiaoboGu
>            Priority: Minor
>         Attachments: csv2seq.patch, csv2seq.patch, test-data.zip
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

RE: [jira] [Commented] (MAHOUT-781) universal map-reduce job to convert csv file to vectorwritable sequencefile

Reply via email to