Jason Black created CAMEL-12698:
-----------------------------------

             Summary: Unmarshalling a CSV file with the NEL (next line) 
character will cause Bindy to misread the entire file
                 Key: CAMEL-12698
                 URL: https://issues.apache.org/jira/browse/CAMEL-12698
             Project: Camel
          Issue Type: Bug
          Components: camel-bindy
    Affects Versions: 2.22.0
            Reporter: Jason Black


I am using Apache Camel to process a lot of large CSV files, and relying on 
Bindy to assist with unmarshalling them into POJOs.

We have an upstream data bug which causes a record of ours to contain the 
Unicode character 
[NEL|http://www.fileformat.info/info/unicode/char/85/index.htm], but while 
we're working through the cause of that, I found it curious as to what Bindy is 
actually doing with it.  We rely on the unmarshal process to perform a batch 
insert, and because our POJO is missing certain fields, we started observing 
that the 

Bindy is relying on Scanner to read lines in a large file; however, Scanner 
itself also does some parsing of the line with the assumption that, if it sees 
the NEL character, it will regard it as a newline character.  The modern Files 
API does not make this distinction and reads to a newline designation only (e.g 
\n, \r, or \r\n).

There are two ways to fix this from what I've been able to smoke test:
 * Change the Scanner implementation to use a delimeter of the more traditional 
newline characters
 * Use Java 8's Files API and stream the file in

I would personally want to use the Files API to handle this since it's more 
robust and capable of higher performance, but I'll explore both approaches and 
see where I end up.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to