[ https://issues.apache.org/jira/browse/SPARK-37575 ]
Guo Wei deleted comment on SPARK-37575:
---------------------------------
was (Author: wayne guo):
I also found that if emptyValueInRead set to "\"\"", reading csv data as show
below:
{noformat}
name,brand,comment
tesla,,""{noformat}
The final result shows as follows:
||name||brand||comment||
|tesla|null|""|
But, the expected result should be:
||name||brand||comment||
|tesla|null| |
> Null values are saved as quoted empty Strings "" rather than nothing
> --------------------------------------------------------------------
>
> Key: SPARK-37575
> URL: https://issues.apache.org/jira/browse/SPARK-37575
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.0, 3.2.0
> Reporter: Guo Wei
> Priority: Major
>
> As mentioned in sql migration
> guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In
> version 2.3 and earlier, empty strings are equal to null values and do not
> reflect to any characters in saved CSV files. For example, the row of "a",
> null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as
> a,,"",1. To restore the previous behavior, set the CSV option emptyValue to
> empty (not quoted) string.{noformat}
>
> But actually, both empty strings and null values are saved as quoted empty
> Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
> actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2:
> line3: ""
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]