[
https://issues.apache.org/jira/browse/MAHOUT-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714057#comment-13714057
]
Alex Franchuk commented on MAHOUT-1287:
---------------------------------------
No problem. I'm just glad it's getting out there so others can have an easier
time with it.
Also, the same solution can be used to standardize the csv parsing of the
classifier.df.DataLoader and classifier.df.DataConverter classes, which
actually split up fields based on commas or spaces... although I'm not sure how
maintained those classes are at the moment.
> classifier.sgd.CsvRecordFactory incorrectly parses CSV format
> -------------------------------------------------------------
>
> Key: MAHOUT-1287
> URL: https://issues.apache.org/jira/browse/MAHOUT-1287
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Reporter: Alex Franchuk
> Priority: Minor
> Labels: csv, parser
> Attachments: CsvRecordFactory_CsvParseFix.patch
>
> Original Estimate: 0h
> Remaining Estimate: 0h
>
> CsvRecordFactory uses very simplistic CSV parsing, and incorrectly parses CSV
> strings when there are double-quoted fields with commas present.
> This problem also affects the command-line demo programs which use
> CsvRecordFactory (mostly the sgd-related programs).
> Attached is a patch to fix the problem.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira