Hi All,
I've been working with mahout for an internship this summer, and in the process 
I noticed that the CsvRecordFactory class uses incorrect parsing of CSV files. 
So I made a fix for this, which is in the attached patch file. It's not a huge 
change or anything, but I thought it would be helpful for people. This will 
also fix the demo programs in the mahout distribution from failing due to 
incorrect parsing of CSV files. For instance, if you have a double-quoted field 
with a comma in it, the demo programs will incorrectly divide the field into 
two, which in some cases causes parsing problems, and even if the program 
doesn't fail, it will of course cause incorrect results.

This patch causes the class to use the solr-commons-csv.jar file, which I 
noticed was included in the mahout distribution.

Hope this helps! And thanks for all your work, my experience with Mahout has 
been great so far.
Alex Franchuk

Reply via email to