Some users of the CSV Input Format at Cerner had some issues with CSV files from clients where there were stray, unescaped double-quotes inside of fields (ostensibly representing inches). Some bureaucratic stuff prevented us from getting those files reliably cleaned up, so we brainstormed and figured out a way to make the CSV Input Format able to ignore the stray quotes and pass them forward to be handled by whatever parsing solution comes later. We are working on implementing this into our copy of the input format and it seems to be working so far.
My question is, is this something that we should log a JIRA for and submit our work to Crunch as well? It’s handy in our case, but the files are truly malformed and not following the CSV standards. Should the CSVInputFormat have configurable options to be able to handle malformed files and pass bad records forward, or is the current behavior (blow up and give some info about where the bad records start) the way it truly should behave? Thanks for your input, Mac CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.