The CSVPrinter ecapsing inconsistant with CSVParser
---------------------------------------------------
Key: SANDBOX-308
URL: https://issues.apache.org/jira/browse/SANDBOX-308
Project: Commons Sandbox
Issue Type: Bug
Components: CSV
Reporter: Colin Goodheart-Smithe
Priority: Minor
The CSVPrinter ecapses new line and return character to "\n" and "\r" if these
occur within the encapsulators (this is within the
CSVPrinter.escapeAndQuote(String) method). However, the CSVParser do not
convert these back to new line and return characters in the same fashion. So
if you use the CSVPrinter to create a delimited file containing new line or
return characters within an entry and then read this file using the CSVParser
the text read in by the CSVParser will not match the text written by the
CSVPrinter (the difference being that every new line and return character will
be replaced by "\n" and "\r" respectively).
A possible fix for this would be to add two extra 'else if' statements to
CSVParser.encapsulatedTokenLexer(Token, int) starting at line 49, as detailed
below (the _ehampsised_ text indicated the changes):
else if (c == '\\' && in.lookAhead() == '\\')
{
// doubled escape char, it does not escape itself, only
encapsulator
// -> add both escape chars to stream
tkn.content.append((char) c);
c = in.read();
tkn.content.append((char) c);
}
_else if (c == '\\' && in.lookAhead() == 'n')_
_{_
_ // escaped java new line character, append a new line
character_
_tkn.content.append('\n');_
_c = in.read();_
_}_
_else if (c == '\\' && in.lookAhead() == 'r')_
_{_
_// escaped java return character, append a return character_
_tkn.content.append('\r');_
_c = in.read();_
_}_
else if (strategy.getUnicodeEscapeInterpretation() && c == '\\'
&& in.lookAhead() == 'u')
{
// interpret unicode escaped chars (like \u0070 -> p)
tkn.content.append((char) unicodeEscapeLexer(c));
}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.