[ 
https://issues.apache.org/jira/browse/SPARK-25251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594522#comment-16594522
 ] 

Hyukjin Kwon commented on SPARK-25251:
--------------------------------------

This is a duplicate of SPARK-22236

> Make spark-csv's `quote` and `escape` options conform to RFC 4180
> -----------------------------------------------------------------
>
>                 Key: SPARK-25251
>                 URL: https://issues.apache.org/jira/browse/SPARK-25251
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.3.0, 2.3.1, 2.4.0, 3.0.0
>            Reporter: Ruslan Dautkhanov
>            Priority: Major
>
> As described inĀ [RFC-4180|https://tools.ietf.org/html/rfc4180], page 2 -
> {noformat}
>    7. If double-quotes are used to enclose fields, then a double-quote 
> appearing inside a field must be escaped by preceding it with another double 
> quote
> {noformat}
> That's what Excel does, for example, by default.
> Although in Spark (as of Spark 2.1), escaping is done by default through 
> non-RFC way, using backslah (\). To fix this you have to explicitly tell 
> Spark to use doublequote to use for as an escape character:
> {code}
> .option('quote', '"') 
> .option('escape', '"')
> {code}
> This may explain that a comma character wasn't interpreted as it was inside a 
> quoted column.
> So this is request to make spark-csv reader RFC-4180 compatible in regards to 
> default option values for `quote` and `escape` (make both equal to " ).
> Since this is a backward-incompatible change, Spark 3.0 might be a good 
> release for this change.
> Some more background - https://stackoverflow.com/a/45138591/470583 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to