[
https://issues.apache.org/jira/browse/SANDBOX-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555992#action_12555992
]
Yonik Seeley commented on SANDBOX-206:
--------------------------------------
It's interesting... looking at CSVParserTest.testBackslashEscaping(), much of
this is deliberate:
{code}
String code =
"one,two,three\n"
+ "on\\\"e,two\n"
+ "on\"e,two\n"
+ "one,\"tw\\\"o\"\n"
+ "one,\"t\\,wo\"\n"
+ "one,two,\"th,ree\"\n"
+ "\"a\\\\\"\n"
+ "a\\,b\n"
+ "\"a\\\\,b\"";
String[][] res = {
{ "one", "two", "three" },
{ "on\\\"e", "two" },
{ "on\"e", "two" },
{ "one", "tw\"o" },
{ "one", "t\\,wo" }, // backslash in quotes only escapes a delimiter
(",")
{ "one", "two", "th,ree" },
{ "a\\\\" }, // backslash in quotes only escapes a delimiter (",")
{ "a\\", "b" }, // a backslash must be returnd
{ "a\\\\,b" } // backslash in quotes only escapes a delimiter (",")
};
{code}
Does anyone have knowledge of why this particular escaping mechanism is used?
> backslash before quote character gives an error
> -----------------------------------------------
>
> Key: SANDBOX-206
> URL: https://issues.apache.org/jira/browse/SANDBOX-206
> Project: Commons Sandbox
> Issue Type: Bug
> Components: CSV
> Environment: Windows, SOLR 1.2
> Reporter: Michael Lackhoff
>
> A CSV-field with the contents "This is text with a \""quoted"" string" gives
> the error
> "invalid char between encapsualted token end delimiter". If the backslash is
> not immediately before the double quote, everything is fine.
> The same error occurs when the backslash is the last character in the field
> (directly before the delimiter), like:
> "This is a text with a backslash \".
> Here the reason might be that the backslash also works as an escape character
> like in
> "This is a field with a \"quoted\" text" (no error, just the quotes in the
> resulting field)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.