Hi Sebastian,
I just noticed that. My mistake, I will open a ticket.

Thanks,
Alex

-----Original Message-----
From: Sebastian Schelter [mailto:[email protected]] 
Sent: Thursday, July 18, 2013 1:33 PM
To: [email protected]
Subject: Re: classifier.sgd.CsvRecordFactory incorrect CSV parsing

Hello Alex,

thank you for willing to contribute. Unfortunately you cannot send attachments 
via this list. Could you open a jira ticket at 
https://issues.apache.org/jira/browse/MAHOUT and upload your patch there?

-sebastian


2013/7/18 Alexander Franchuk <[email protected]>

>  Hi All,****
>
> I’ve been working with mahout for an internship this summer, and in 
> the process I noticed that the CsvRecordFactory class uses incorrect 
> parsing of CSV files. So I made a fix for this, which is in the attached 
> patch file.
> It’s not a huge change or anything, but I thought it would be helpful 
> for people. This will also fix the demo programs in the mahout 
> distribution from failing due to incorrect parsing of CSV files. For 
> instance, if you have a double-quoted field with a comma in it, the 
> demo programs will incorrectly divide the field into two, which in 
> some cases causes parsing problems, and even if the program doesn’t 
> fail, it will of course cause incorrect results.****
>
> ** **
>
> This patch causes the class to use the solr-commons-csv.jar file, 
> which I noticed was included in the mahout distribution.****
>
> ** **
>
> Hope this helps! And thanks for all your work, my experience with 
> Mahout has been great so far.****
>
> Alex Franchuk****
>

Reply via email to