[ https://issues.apache.org/jira/browse/SPARK-25251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594522#comment-16594522 ]
Hyukjin Kwon commented on SPARK-25251: -------------------------------------- This is a duplicate of SPARK-22236 > Make spark-csv's `quote` and `escape` options conform to RFC 4180 > ----------------------------------------------------------------- > > Key: SPARK-25251 > URL: https://issues.apache.org/jira/browse/SPARK-25251 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 2.3.0, 2.3.1, 2.4.0, 3.0.0 > Reporter: Ruslan Dautkhanov > Priority: Major > > As described inĀ [RFC-4180|https://tools.ietf.org/html/rfc4180], page 2 - > {noformat} > 7. If double-quotes are used to enclose fields, then a double-quote > appearing inside a field must be escaped by preceding it with another double > quote > {noformat} > That's what Excel does, for example, by default. > Although in Spark (as of Spark 2.1), escaping is done by default through > non-RFC way, using backslah (\). To fix this you have to explicitly tell > Spark to use doublequote to use for as an escape character: > {code} > .option('quote', '"') > .option('escape', '"') > {code} > This may explain that a comma character wasn't interpreted as it was inside a > quoted column. > So this is request to make spark-csv reader RFC-4180 compatible in regards to > default option values for `quote` and `escape` (make both equal to " ). > Since this is a backward-incompatible change, Spark 3.0 might be a good > release for this change. > Some more background - https://stackoverflow.com/a/45138591/470583 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org