[
https://issues.apache.org/jira/browse/SPARK-37575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524600#comment-17524600
]
Apache Spark commented on SPARK-37575:
--------------------------------------
User 'anchovYu' has created a pull request for this issue:
https://github.com/apache/spark/pull/36268
> null values should be saved as nothing rather than quoted empty Strings ""
> with default settings
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-37575
> URL: https://issues.apache.org/jira/browse/SPARK-37575
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.0, 3.2.0
> Reporter: Wei Guo
> Assignee: Wei Guo
> Priority: Major
> Fix For: 3.3.0
>
>
> As mentioned in sql migration
> guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In
> version 2.3 and earlier, empty strings are equal to null values and do not
> reflect to any characters in saved CSV files. For example, the row of "a",
> null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as
> a,,"",1. To restore the previous behavior, set the CSV option emptyValue to
> empty (not quoted) string.{noformat}
> But actually, both empty strings and null values are saved as quoted empty
> Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
> actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2:
> line3: ""
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]