[
https://issues.apache.org/jira/browse/CSV-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962812#comment-15962812
]
Aaron Digulla commented on CSV-141:
-----------------------------------
Idea for the patch: When an error happens, the parser should advance the reader
to the next EOL. That way, code could handle the exception and call
{{getNextRecord()}} to resume with the next line.
For cases where you have known bad input (= input with known bugs), you could
try to write a "patching" InputStream/reader that fixes the problems and sends
sanitized input to CSVReader.
If we take CSV-147 into account, then another approach would be to have error
handling strategies. CSV could supply a base set of error handling strategies
(give up/throw exception, try to patch the problem, advance to EOL and add
problem to an error report, etc) which users could build on.
> Handle malformed CSV files
> --------------------------
>
> Key: CSV-141
> URL: https://issues.apache.org/jira/browse/CSV-141
> Project: Commons CSV
> Issue Type: Wish
> Components: Parser
> Affects Versions: 1.0
> Reporter: Nguyen Minh
> Priority: Minor
> Fix For: 1.x
>
>
> My java application has to handle thousands of CSV files uploaded by the
> client phones everyday. So, there some CSV files have the wrong format which
> I'm not sure why.
> Here is my sample CSV. Microsoft Excel parses it correctly, but both Common
> CSV and OpenCSV can't parse it. Open CSV can't parse line 2 (due to '\'
> character) and Common CSV will crash on line 3 and 4:
> "1414770317901","android.widget.EditText","pass sem1 _84*|*","0","pass sem1
> _8"
> "1414770318470","android.widget.EditText","pass sem1 _84:*|*","0","pass sem1
> _84:\"
> "1414770318327","android.widget.EditText","pass sem1
> "1414770318628","android.widget.EditText","pass sem1 _84*|*","0","pass sem1
> Line 3: java.io.IOException: (line 5) invalid char between encapsulated token
> and delimiter
> at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
> at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
> Line 4: java.io.IOException: (startline 5) EOF reached before encapsulated
> token finished
> at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
> at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)