[
https://issues.apache.org/jira/browse/CSV-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492340#comment-17492340
]
Angus C commented on CSV-290:
-----------------------------
Basically the "EOF reached" always happens if quote-char = escape-char.
Considering the input string ("a"), Lexer.java treats the second (") as an
escape char and read the unescaped \r, and then complain for missing the
ending-quote (")
{code:java}
CSVFormat.Builder.create().setEscape('"').build().parse(new
StringReader("\"a\"")).getRecords();
{code}
I think the setEscape() is used for escaping special char like \r, \t etc. as
in Lexer.readEscape() but not the quote-char. The quote-char should be always
escaped by quote-char, not the escape-char.
Your fix is to disable the escape-char in quoted-string if it is equal to
quote-char. It can be a fail-save but I think we should remove the
.setEscape(DOUBLE_QUOTE_CHAR) in POSTGRESQL_CSV. The javadoc says "special *
characters are escaped with quote" but I doubt that it is correct or not
> Produced CSV using PostgreSQL format cannot be read
> ---------------------------------------------------
>
> Key: CSV-290
> URL: https://issues.apache.org/jira/browse/CSV-290
> Project: Commons CSV
> Issue Type: Bug
> Components: Parser
> Affects Versions: 1.6, 1.9.0
> Reporter: Anatoliy Artemenko
> Priority: Major
>
> {code:java}
> // code placeholder
> {code}
> CSV, produced using printer:
>
> CSVPrinter printer = new CSVPrinter(sw,
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>
> cannot be be read with same format parser:
>
> CSVParser parser = new CSVParser(new StringReader(sw.toString()),
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>
> To reproduce:
>
> {code:java}
> StringWriter sw = new StringWriter();
> CSVPrinter printer = new CSVPrinter(sw,
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
> printer.printRecord("column1", "column2");
> printer.printRecord("v11", "v12");
> printer.printRecord("v21", "v22");
> printer.close();
> CSVParser parser = new CSVParser(new StringReader(sw.toString()),
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
> System.out.println("headers: " +
> Arrays.equals(parser.getHeaderNames().toArray(), new String[] {"column1",
> "column2"}));
> Iterator<CSVRecord> i = parser.iterator();
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new
> String[] {"v11", "v12"}));
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new
> String[] {"v21", "v22"}));{code}
> I'd expect the above code to work, but it fails:
> {code:java}
> java.io.IOException: (startline 1) EOF reached before encapsulated token
> finishedjava.io.IOException: (startline 1) EOF reached before encapsulated
> token finished
> at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371)
> at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285)
> at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701)
> at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:480)
> at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:432)
> at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:398)
> at Test.main(Test.java:25)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)