[jira] [Commented] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

Rick Moritz (JIRA) Mon, 24 Apr 2017 05:12:26 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981068#comment-15981068
 ]


Rick Moritz commented on SPARK-20155:
-------------------------------------

Why shoudn't we change the default escape character, when there's a csv rfc 
which only ever mentions one csv escape character - and it tricked you and me 
both?
Sure, it may break an existing API, but this looks like that API is already 
broken, given that shell-style escapes aren't part of the CSV-RFC.Sure, it's 
not a standard, but by diverging from the proposal Spark is making CSV more 
fractured than it needs to be.
The option of reading broken csv files is fine by me, but the default ought to 
be as close as possible to the rfc as can be.

> CSV-files with quoted quotes can't be parsed, if delimiter follows quoted 
> quote
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-20155
>                 URL: https://issues.apache.org/jira/browse/SPARK-20155
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output, SQL
>    Affects Versions: 2.0.0
>            Reporter: Rick Moritz
>
> According to :
> https://tools.ietf.org/html/rfc4180#section-2
> 7.  If double-quotes are used to enclose fields, then a double-quote
>        appearing inside a field must be escaped by preceding it with
>        another double quote.  For example:
>        "aaa","b""bb","ccc"
> This currently works as is, but the following does not:
>  "aaa","b""b,b","ccc"
> while  "aaa","b\"b,b","ccc" does get parsed.
> I assume, this happens because quotes are currently being parsed in pairs, 
> and that somehow ends up unquoting delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

Reply via email to