Hi All, Just raised a PR (https://github.com/apache/nifi/pull/2172) for JIRA NIFI-4416 <https://issues.apache.org/jira/browse/NIFI-4416>
Appreciate your help, Peter and Matt. Could you please have a quick look and give your comments. Joe - Could you also check out the JIRA and let me know if I've committed some crime. You guys are the best ! Best Regards, Arun On Mon, Sep 25, 2017 at 9:44 AM Arun Manivannan <a...@arunma.com> wrote: > Thanks a lot, gentlemen. JIRA and PR coming through in a few hours. > > On Mon, Sep 25, 2017, 09:07 Matt Burgess <mattyb...@gmail.com> wrote: > >> Thanks all, if the PR is available tomorrow I can review as well and >> merge, but I will be on vacation for a week after that. No pressure :) >> >> Regards, >> Matt >> >> > On Sep 24, 2017, at 8:57 PM, Joe Witt <joe.w...@gmail.com> wrote: >> > >> > Thanks Arun and Peter. Getting that resolved will be nice. The >> > performance difference of the record reader/writer approach in all >> > this is pretty fantastic so the more we can do to iron out these sorts >> > of edges the better. Thanks! >> > >> >> On Sun, Sep 24, 2017 at 8:56 PM, Peter Wicks (pwicks) < >> pwi...@micron.com> wrote: >> >> Arun, >> >> >> >> I'm also using Ctrl+A as a delimiter and had the same problem. I >> haven't had time to write up a PR but it looked like a pretty easy fix to >> me too. >> >> >> >> I can't merge the change if you submit it, but I'd be happy to review >> it. >> >> >> >> --Peter >> >> >> >> -----Original Message----- >> >> From: Arun Manivannan [mailto:a...@arunma.com] >> >> Sent: Sunday, September 24, 2017 11:17 PM >> >> To: Dev@nifi.apache.org >> >> Subject: [EXT] ConvertCSVToAvro vs CSVReader - Value Delimiter >> >> >> >> Hi, >> >> >> >> The ConvertCSVToAvro processor have been having performance issues >> while processing files which are more than a GB and I was suggested to use >> the ConvertRecord that leverages the RecordReader and Writer. Did some >> tests and they do perform well. >> >> >> >> Strangely, the CSVReader doesn't accept unicode character as the value >> delimiter - Control A (\u0001) character is the delimiter of my CSV. >> >> >> >> Did some analysis and I see that a minor change needs to be made on >> the CSVUtils to unescape the delimiter, like what ConvertCSVToAvro does and >> also modify the SingleCharacterValidator. >> >> >> >> Please let me know if you believe this isn't an issue and there's a >> workaround for this. Else, I am more than happy to raise an issue and >> submit a PR for review. >> >> >> >> Best Regards, >> >> Arun >> >