[jira] (SPARK-37575) Null values are saved as quoted empty Strings "" rather than nothing

Guo Wei (Jira) Thu, 09 Dec 2021 11:21:34 -0800


    [ https://issues.apache.org/jira/browse/SPARK-37575 ]



    Guo Wei deleted comment on SPARK-37575:
    ---------------------------------

was (Author: wayne guo):
I also found that if emptyValueInRead set to "\"\"", reading csv data as show 
below:
{noformat}
name,brand,comment
tesla,,""{noformat}
The final result shows as follows:
||name||brand||comment||
|tesla|null|""|

But, the expected result should be:
||name||brand||comment||
|tesla|null| |

> Null values are saved as quoted empty Strings "" rather than nothing
> --------------------------------------------------------------------
>
>                 Key: SPARK-37575
>                 URL: https://issues.apache.org/jira/browse/SPARK-37575
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 3.2.0
>            Reporter: Guo Wei
>            Priority: Major
>
> As mentioned in sql migration 
> guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In 
> version 2.3 and earlier, empty strings are equal to null values and do not 
> reflect to any characters in saved CSV files. For example, the row of "a", 
> null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as 
> a,,"",1. To restore the previous behavior, set the CSV option emptyValue to 
> empty (not quoted) string.{noformat}
>  
> But actually, both empty strings and null values are saved as quoted empty 
> Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
>  actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2: 
> line3: ""
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] (SPARK-37575) Null values are saved as quoted empty Strings "" rather than nothing

Reply via email to