Arun,

I'm also using Ctrl+A as a delimiter and had the same problem.  I haven't had 
time to write up a PR but it looked like a pretty easy fix to me too.

I can't merge the change if you submit it, but I'd be happy to review it.

--Peter

-----Original Message-----
From: Arun Manivannan [mailto:a...@arunma.com] 
Sent: Sunday, September 24, 2017 11:17 PM
To: Dev@nifi.apache.org
Subject: [EXT] ConvertCSVToAvro vs CSVReader - Value Delimiter

Hi,

The ConvertCSVToAvro processor have been having performance issues while 
processing files which are more than a GB and I was suggested to use the 
ConvertRecord that leverages the RecordReader and Writer. Did some tests and 
they do perform well.

Strangely, the CSVReader doesn't accept unicode character as the value 
delimiter - Control A  (\u0001) character is the delimiter of my CSV.

Did some analysis and I see that a minor change needs to be made on the 
CSVUtils to unescape the delimiter, like what ConvertCSVToAvro does and also 
modify the SingleCharacterValidator.

Please let me know if you believe this isn't an issue and there's a workaround 
for this. Else, I am more than happy to raise an issue and submit a PR for 
review.

Best Regards,
Arun

Reply via email to