[ https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980974#comment-15980974 ]
Armin Braun commented on SPARK-20155: ------------------------------------- I was able to reproduce this: {code} "aaa","b\"b,b","ccc" {code} gives us {code} scala> spark.read.option("wholeFile", true).csv("file:///tmp/tmp2.csv").show() +---+-----+---+ |_c0| _c1|_c2| +---+-----+---+ |aaa|b"b,b|ccc| +---+-----+---+ {code} while {code} "aaa","b""b,b","ccc" {code} gives us: {code} scala> spark.read.option("wholeFile", true).csv("file:///tmp/tmp2.csv").show() +---+-----+---+---+ |_c0| _c1|_c2|_c3| +---+-----+---+---+ |aaa|"b""b| b"|ccc| {code} Will try to fix :) > CSV-files with quoted quotes can't be parsed, if delimiter follows quoted > quote > ------------------------------------------------------------------------------- > > Key: SPARK-20155 > URL: https://issues.apache.org/jira/browse/SPARK-20155 > Project: Spark > Issue Type: Bug > Components: Input/Output, SQL > Affects Versions: 2.0.0 > Reporter: Rick Moritz > > According to : > https://tools.ietf.org/html/rfc4180#section-2 > 7. If double-quotes are used to enclose fields, then a double-quote > appearing inside a field must be escaped by preceding it with > another double quote. For example: > "aaa","b""bb","ccc" > This currently works as is, but the following does not: > "aaa","b""b,b","ccc" > while "aaa","b\"b,b","ccc" does get parsed. > I assume, this happens because quotes are currently being parsed in pairs, > and that somehow ends up unquoting delimiter. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org