[ 
https://issues.apache.org/jira/browse/SPARK-25506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25506.
----------------------------------
    Resolution: Duplicate

> Spark CSV multiline with CRLF
> -----------------------------
>
>                 Key: SPARK-25506
>                 URL: https://issues.apache.org/jira/browse/SPARK-25506
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>    Affects Versions: 2.2.0, 2.3.1
>         Environment: spark 2.2.0 and 2.3.1
> scala 2.11.8
>            Reporter: eugen yushin
>            Priority: Major
>
> Spark produces empty rows (or ']' when printing via call to `collect`) 
> dealing with '\r' character at the end of each line in CSV file. Note, no 
> fields are escaped in original input file.
> {code:java}
> val multilineDf = sparkSession.read
>   .format("csv")
>   .options(Map("header" -> "true", "inferSchema" -> "false", "escape" -> 
> "\"", "multiLine" -> "true"))
>   .load("src/test/resources/multiLineHeader.csv")
> val df = sparkSession.read
>   .format("csv")
>   .options(Map("header" -> "true", "inferSchema" -> "false", "escape" -> 
> "\""))
>   .load("src/test/resources/multiLineHeader.csv")
> multilineDf.show()
> multilineDf.collect().foreach(println)
> df.show()
> df.collect().foreach(println)
> {code}
> Result:
> {code:java}
> +----+-----+
> |
> +----+-----+
> |
> |
> +----+-----+
> ]
> ]
> +----+----+
> |col1|col2|
> +----+----+
> |   1|   1|
> |   2|   2|
> +----+----+
> [1,1]
> [2,2]
> {code}
> Input file:
> {code:java}
> cat -vt src/test/resources/multiLineHeader.csv
> col1,col2^M
> 1,1^M
> 2,2^M
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to