Some users of the CSV Input Format at Cerner had some issues with CSV files 
from clients where there were stray, unescaped double-quotes inside of fields 
(ostensibly representing inches). Some bureaucratic stuff prevented us from 
getting those files reliably cleaned up, so we brainstormed and figured out a 
way to make the CSV Input Format able to ignore the stray quotes and pass them 
forward to be handled by whatever parsing solution comes later. We are working 
on implementing this into our copy of the input format and it seems to be 
working so far.

My question is, is this something that we should log a JIRA for and submit our 
work to Crunch as well? It’s handy in our case, but the files are truly 
malformed and not following the CSV standards. Should the CSVInputFormat have 
configurable options to be able to handle malformed files and pass bad records 
forward, or is the current behavior (blow up and give some info about where the 
bad records start) the way it truly should behave?

Thanks for your input,
Mac

CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Reply via email to