[
https://issues.apache.org/jira/browse/CSV-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652125#comment-17652125
]
Damjan Jovanovic edited comment on CSV-141 at 12/27/22 5:43 AM:
----------------------------------------------------------------
This kind of patch:
{code:java}
// code placeholder
diff --git a/src/main/java/org/apache/commons/csv/Lexer.java
b/src/main/java/org/apache/commons/csv/Lexer.java
index fd60b5ac..177f56d6 100644
--- a/src/main/java/org/apache/commons/csv/Lexer.java
+++ b/src/main/java/org/apache/commons/csv/Lexer.java
@@ -378,9 +378,15 @@ final class Lexer implements Closeable {
}
}
} else if (isEndOfFile(c)) {
- // error condition (end of file before end of token)
- throw new IOException("(startline " + startLineNumber +
- ") EOF reached before encapsulated token finished");
+ if (allowTrailingText) {
+ token.type = EOF;
+ token.isReady = true; // There is data at EOF
+ return token;
+ } else {
+ // error condition (end of file before end of token)
+ throw new IOException("(startline " + startLineNumber +
+ ") EOF reached before encapsulated token
finished");
+ }
} else {
// consume character
token.content.append((char) c); {code}
gets the EOF-implicitly-closes-encapsulated-field feature to work too, and
successfully parses the CSV snippet in the original comment in the same way as
Excel.
I am not sure whether this should be activated by the same flag
(allowTrailingText) as my PR, or whether it should be a separate setting users
can toggle on and off. [~ggregory]?
was (Author: damjan):
This kind of patch:
{code:java}
// code placeholder
diff --git a/src/main/java/org/apache/commons/csv/Lexer.java
b/src/main/java/org/apache/commons/csv/Lexer.java
index fd60b5ac..177f56d6 100644
--- a/src/main/java/org/apache/commons/csv/Lexer.java
+++ b/src/main/java/org/apache/commons/csv/Lexer.java
@@ -378,9 +378,15 @@ final class Lexer implements Closeable {
}
}
} else if (isEndOfFile(c)) {
- // error condition (end of file before end of token)
- throw new IOException("(startline " + startLineNumber +
- ") EOF reached before encapsulated token finished");
+ if (allowTrailingText) {
+ token.type = EOF;
+ token.isReady = true; // There is data at EOF
+ return token;
+ } else {
+ // error condition (end of file before end of token)
+ throw new IOException("(startline " + startLineNumber +
+ ") EOF reached before encapsulated token
finished");
+ }
} else {
// consume character
token.content.append((char) c); {code}
gets the EOF-implicitly-closes-unquoted-field feature to work too, and
successfully parses the CSV snippet in the original comment in the same way as
Excel.
I am not sure whether this should be activated by the same flag
(allowTrailingText) as my PR, or whether it should be a separate setting users
can toggle on and off. [~ggregory]?
> Handle malformed CSV files
> --------------------------
>
> Key: CSV-141
> URL: https://issues.apache.org/jira/browse/CSV-141
> Project: Commons CSV
> Issue Type: Wish
> Components: Parser
> Affects Versions: 1.0
> Reporter: Nguyen Minh
> Priority: Minor
> Fix For: 1.x
>
>
> My java application has to handle thousands of CSV files uploaded by the
> client phones everyday. So, there some CSV files have the wrong format which
> I'm not sure why.
> Here is my sample CSV. Microsoft Excel parses it correctly, but both Common
> CSV and OpenCSV can't parse it. Open CSV can't parse line 2 (due to '\'
> character) and Common CSV will crash on line 3 and 4:
> "1414770317901","android.widget.EditText","pass sem1 _84*|*","0","pass sem1
> _8"
> "1414770318470","android.widget.EditText","pass sem1 _84:*|*","0","pass sem1
> _84:\"
> "1414770318327","android.widget.EditText","pass sem1
> "1414770318628","android.widget.EditText","pass sem1 _84*|*","0","pass sem1
> Line 3: java.io.IOException: (line 5) invalid char between encapsulated token
> and delimiter
> at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
> at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
> Line 4: java.io.IOException: (startline 5) EOF reached before encapsulated
> token finished
> at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
> at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)