Now that the CSVFileSource is in crunch 0.8.3, I’ve been trying to integrate it 
into the project that originally spurred its creation. However, I’m running 
into some weird issues.

Reading and directly materializing and using a new CSVFileSource works fine, 
that scenario is already in the CSVFileSourceIT.
https://github.com/apache/crunch/blob/apache-crunch-0.8.3/crunch-core/src/it/java/org/apache/crunch/io/text/csv/CSVFileSourceIT.java#L41

But, as soon as I try to do something extra with that PCollection, say, use 
count() to turn it into a PTable, grab its key set, then print it out, 
everything falls apart
New Test:
https://github.com/champgm/crunch/blob/master/crunch-core/src/it/java/org/apache/crunch/io/text/csv/CSVFileSourceIT.java#L56

Result:
http://pastebin.com/f7iUQ73N

It seems that, when some additional actions are added to the pipeline, a 
CSVRecordReader is being created in CrunchRecordReader without going through 
the CSVFileSource or CSVInputFormat flow, where its various parsing options are 
normally configured.

I was able to fix this issue by copying the "configure” method from 
CSVInputFormat and adding it to the beginning of the “initialize” method of the 
CSVRecordReader, which forces it to check the job config and configure itself 
if some options are null, but I don’t really feel like this is ideal. Did I 
miss something when I was designing this set of classes? Is this behavior 
expected?

CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Reply via email to