[
https://issues.apache.org/jira/browse/CSV-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joern Huxhorn updated CSV-294:
------------------------------
Description:
Reading data that contains " does not work if escape character is *manually set
to {{'"'}}* as specified in [RFC
4180|https://datatracker.ietf.org/doc/html/rfc4180].
*It works for other escape characters or if no escape character is explicitly
defined in the format.*
This line in {{Lexer.java}} is responsible for the originally quite erroneous
ticket:
{{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
>From this line I (wrongly) deduced that an unspecified escape character would
>actually disable escaping. Because of that I wanted to enable it by setting it
>to {{'"'}} which causes exceptions in the Lexer for perfectly valid input.
>That in turn convinced my that this is a way bigger issue than it is. Sorry
>about that.
I don't think that the current situation is ideal, though.
I would not have been this confused if {{CSVFormat}} would be more explicit
about the escape char that will be used, i.e. if {{toString()}} would show the
implicitly used quote character or print - in case of {{null}} - that this
means it's using the quote character. It is currently omitted from the output
if it is not set explicitly.
There is also no documentation about what {{null}} as escape character actually
means - it may be documented somewhere but isn't documented for
{{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}}
methods.
And setting the escape character explicitly to the value specified in the RFC
should certainly not fail, even if setting it to that value is superfluous
since {{null}} behaves exactly the same.
h4. Relevant part of the RFC:
7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
h4. Related issue:
https://issues.apache.org/jira/browse/CSV-150
was:
Reading data that contains " does not work if escape character is manually set
to {{'"'}} as specified in [RFC
4180|https://datatracker.ietf.org/doc/html/rfc4180]. It works for other escape
characters or if no escape character is defined in the format.
{{CSVFormat.DEFAULT}} or at least {{CSVFormat.RFC4180}} and {{CSVFormat.EXCEL}}
should have escape character set to '"' instead of {{null}} by default.
This line in {{Lexer.java}} is responsible for the originally quite erroneous
ticket:
{{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
>From this line I (wrongly) deduced that an unspecified escape character would
>actually disable escaping. Because of that I wanted to enable it by setting it
>to {{'"'}} which causes exceptions in the Lexer for perfectly valid input.
>That in turn convinced my that this is a way bigger issue than it is. Sorry
>about that.
I don't think that the current situation is ideal, though. I would not have
been this confused if {{CSVFormat}} would be more explicit about the escape
char that will be used, i.e. if {{toString()}} would show the implicitly used
quote character. It is currently omitted from the output if it is not set
explcitly.
h4. Relevant part of the RFC:
7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
h4. Related issue:
https://issues.apache.org/jira/browse/CSV-150
> CSVFormat does not support explicit " as escape char
> ----------------------------------------------------
>
> Key: CSV-294
> URL: https://issues.apache.org/jira/browse/CSV-294
> Project: Commons CSV
> Issue Type: Bug
> Affects Versions: 1.9.0
> Reporter: Joern Huxhorn
> Priority: Major
>
> Reading data that contains " does not work if escape character is *manually
> set to {{'"'}}* as specified in [RFC
> 4180|https://datatracker.ietf.org/doc/html/rfc4180].
> *It works for other escape characters or if no escape character is explicitly
> defined in the format.*
> This line in {{Lexer.java}} is responsible for the originally quite erroneous
> ticket:
> {{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
> From this line I (wrongly) deduced that an unspecified escape character would
> actually disable escaping. Because of that I wanted to enable it by setting
> it to {{'"'}} which causes exceptions in the Lexer for perfectly valid input.
> That in turn convinced my that this is a way bigger issue than it is. Sorry
> about that.
> I don't think that the current situation is ideal, though.
> I would not have been this confused if {{CSVFormat}} would be more explicit
> about the escape char that will be used, i.e. if {{toString()}} would show
> the implicitly used quote character or print - in case of {{null}} - that
> this means it's using the quote character. It is currently omitted from the
> output if it is not set explicitly.
> There is also no documentation about what {{null}} as escape character
> actually means - it may be documented somewhere but isn't documented for
> {{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}}
> methods.
> And setting the escape character explicitly to the value specified in the RFC
> should certainly not fail, even if setting it to that value is superfluous
> since {{null}} behaves exactly the same.
> h4. Relevant part of the RFC:
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> h4. Related issue:
> https://issues.apache.org/jira/browse/CSV-150
--
This message was sent by Atlassian Jira
(v8.20.1#820001)