[
https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981068#comment-15981068
]
Rick Moritz commented on SPARK-20155:
-------------------------------------
Why shoudn't we change the default escape character, when there's a csv rfc
which only ever mentions one csv escape character - and it tricked you and me
both?
Sure, it may break an existing API, but this looks like that API is already
broken, given that shell-style escapes aren't part of the CSV-RFC.Sure, it's
not a standard, but by diverging from the proposal Spark is making CSV more
fractured than it needs to be.
The option of reading broken csv files is fine by me, but the default ought to
be as close as possible to the rfc as can be.
> CSV-files with quoted quotes can't be parsed, if delimiter follows quoted
> quote
> -------------------------------------------------------------------------------
>
> Key: SPARK-20155
> URL: https://issues.apache.org/jira/browse/SPARK-20155
> Project: Spark
> Issue Type: Bug
> Components: Input/Output, SQL
> Affects Versions: 2.0.0
> Reporter: Rick Moritz
>
> According to :
> https://tools.ietf.org/html/rfc4180#section-2
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> This currently works as is, but the following does not:
> "aaa","b""b,b","ccc"
> while "aaa","b\"b,b","ccc" does get parsed.
> I assume, this happens because quotes are currently being parsed in pairs,
> and that somehow ends up unquoting delimiter.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]