Jonathan created SPARK-29610:
--------------------------------

             Summary: Keys with Null values are discarded when using to_json 
function
                 Key: SPARK-29610
                 URL: https://issues.apache.org/jira/browse/SPARK-29610
             Project: Spark
          Issue Type: Bug
          Components: Build
    Affects Versions: 2.4.4
            Reporter: Jonathan


When calling to_json on a Struct if a key has Null as a value then the key is 
thrown away.
{code:java}
import pyspark
import pyspark.sql.functions as F
l = [("a", "foo"), ("b", None)]
df = spark.createDataFrame(l, ["id", "data"]) 
(
  df.select(F.struct("*").alias("payload"))
    .withColumn("payload", 
      F.to_json(F.col("payload"))
    ).select("payload")
    .show()
){code}
Produces the following output:
{noformat}
+--------------------+
|             payload|
+--------------------+
|{"id":"a","data":...|
|          {"id":"b"}|
+--------------------+{noformat}
The `data` key in the second row has just been silently deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to