Ruiqi Dong created CSV-326:
------------------------------

             Summary: CSVPrinter Reader printing with quote and escape can emit 
CSV that its parser cannot read back
                 Key: CSV-326
                 URL: https://issues.apache.org/jira/browse/CSV-326
             Project: Commons CSV
          Issue Type: Bug
            Reporter: Ruiqi Dong


*Summary*
When printing normal `CharSequence` values with both a quote character and an 
escape character configured, `CSVFormat#printWithQuotes(Object, CharSequence, 
...)` escapes both quote characters and escape characters.

The `Reader` path does not do the same. `CSVFormat#printWithQuotes(Reader, 
...)` only doubles quote characters and leaves escape characters unchanged. If 
the input stream contains an escape character immediately before a quote, the 
generated CSV can no longer be parsed by the same format.
 
*Affected code*
File: `src/main/java/org/apache/commons/csv/CSVFormat.java`
The `CharSequence` path handles both quote and escape:
{code:java}
if (c == quoteChar || c == escapeChar) {
    out.append(charSeq, start, pos);
    out.append(escapeChar);
    start = pos;
} {code}
The `Reader` path only handles quote:
{code:java}
private void printWithQuotes(final Reader reader, final Appendable appendable) 
throws IOException {
    if (getQuoteMode() == QuoteMode.NONE) {
        printWithEscapes(reader, appendable);
        return;
    }
    final char quote = getQuoteCharacter().charValue();
    append(quote, appendable);
    int c;
    while (EOF != (c = reader.read())) {
        append((char) c, appendable);
        if (c == quote) {
            append(quote, appendable);
        }
    }
    append(quote, appendable);
} {code}
*Reproducer*
Add this test to `src/test/java/org/apache/commons/csv/CSVPrinterTest.java`:
{code:java}
@Test
void testPrintReaderWithQuoteAndEscapeRoundTripsEscapeBeforeQuote() throws 
IOException {
    final CSVFormat format = CSVFormat.DEFAULT.builder()
            .setEscape(BACKSLASH)
            .setQuote('"')
            .get();
    final StringWriter sw = new StringWriter();
    try (CSVPrinter printer = new CSVPrinter(sw, format)) {
        printer.printRecord(new StringReader("\\\""));
    }

    try (CSVParser parser = format.parse(new StringReader(sw.toString()))) {
        assertEquals("\\\"", parser.getRecords().get(0).get(0));
    }
} {code}
Run:
{code:java}
mvn -q 
-Dtest=org.apache.commons.csv.CSVPrinterTest#testPrintReaderWithQuoteAndEscapeRoundTripsEscapeBeforeQuote
 test {code}
Observed behavior:
The parser cannot read the printer's output
{code:java}
java.io.UncheckedIOException: org.apache.commons.csv.CSVException:
(startline 1) EOF reached before encapsulated token finished {code}
*Expected behavior*
Printing a `Reader` value should preserve the same escaping invariants as 
printing the equivalent `String` value. In particular, if an escape character 
is configured, the quoted `Reader` path should not leave escape characters 
unescaped when doing so changes how following quotes are parsed.
 
This is a semantic mismatch between two printer paths for the same logical 
value. A streaming `Reader` value should not produce CSV that is less valid 
than the corresponding in-memory `CharSequence` value under the same 
`CSVFormat`.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to