[
https://issues.apache.org/jira/browse/MAHOUT-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080769#comment-13080769
]
Dan Brickley commented on MAHOUT-781:
-------------------------------------
A utility does sound useful. Good idea Xiaobo.
I was happy to find Danny Bickson's post here -
http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html -
which offers simple CSV importer. It takes sparse from/to/value affinity tuples
and converts them into (and out of) a Mahout binary representation. Would
MAHOUT-781 include this functionality?
It would be good to have a spec. There are lots of subtle variations on the CSV
theme.
Can lines containing #-prefixed comments be included? Are extra blank lines
acceptable or do they cause an error? Are header fields represented somehow
inline, or only in a separate --header document? Is whitespace between field
values discarded, included in the values we pass on, or silently discarded?
If the utility also covers conversion back to CSV, it should be possible to
test round-tripping...
> universal map-reduce job to convert csv file to vectorwritable sequencefile
> ---------------------------------------------------------------------------
>
> Key: MAHOUT-781
> URL: https://issues.apache.org/jira/browse/MAHOUT-781
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Affects Versions: 0.6
> Reporter: XiaoboGu
> Priority: Minor
> Attachments: csv2seq.patch, csv2seq.patch, test-data.zip
>
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira