[
https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980974#comment-15980974
]
Armin Braun commented on SPARK-20155:
-------------------------------------
I was able to reproduce this:
{code}
"aaa","b\"b,b","ccc"
{code}
gives us
{code}
scala> spark.read.option("wholeFile", true).csv("file:///tmp/tmp2.csv").show()
+---+-----+---+
|_c0| _c1|_c2|
+---+-----+---+
|aaa|b"b,b|ccc|
+---+-----+---+
{code}
while
{code}
"aaa","b""b,b","ccc"
{code}
gives us:
{code}
scala> spark.read.option("wholeFile", true).csv("file:///tmp/tmp2.csv").show()
+---+-----+---+---+
|_c0| _c1|_c2|_c3|
+---+-----+---+---+
|aaa|"b""b| b"|ccc|
{code}
Will try to fix :)
> CSV-files with quoted quotes can't be parsed, if delimiter follows quoted
> quote
> -------------------------------------------------------------------------------
>
> Key: SPARK-20155
> URL: https://issues.apache.org/jira/browse/SPARK-20155
> Project: Spark
> Issue Type: Bug
> Components: Input/Output, SQL
> Affects Versions: 2.0.0
> Reporter: Rick Moritz
>
> According to :
> https://tools.ietf.org/html/rfc4180#section-2
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> This currently works as is, but the following does not:
> "aaa","b""b,b","ccc"
> while "aaa","b\"b,b","ccc" does get parsed.
> I assume, this happens because quotes are currently being parsed in pairs,
> and that somehow ends up unquoting delimiter.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]