[ https://issues.apache.org/jira/browse/SPARK-38067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Szymkiewicz resolved SPARK-38067. ---------------------------------------- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35296 [https://github.com/apache/spark/pull/35296] > Inconsistent missing values handling in Pandas on Spark to_json > --------------------------------------------------------------- > > Key: SPARK-38067 > URL: https://issues.apache.org/jira/browse/SPARK-38067 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.2.1 > Reporter: Bjørn Jørgensen > Assignee: Bjørn Jørgensen > Priority: Major > Fix For: 3.3.0 > > > If {{ps.DataFrame.to_json}} is called without {{path}} argument, missing > values are written explicitly > {code:python} > import pandas as pd > import pyspark.pandas as ps > pdf = pd.DataFrame({"id": [1, 2, 3], "value": [None, 3, None]}) > psf = ps.from_pandas(pdf) > psf.to_json() > ## '[{"id":1,"value":null},{"id":2,"value":3.0},{"id":3,"value":null}]' > {code:python} > This behavior is consistent with Pandas: > {code:python} > pdf.to_json() > ## '{"id":{"0":1,"1":2,"2":3},"value":{"0":null,"1":3.0,"2":null}}' > {code} > However, if {{path}} is provided, missing values are omitted by default: > {code:python} > import tempfile > path = tempfile.mktemp() > psf.to_json(path) > spark.read.text(path).show() > ## +--------------------+ > ## | value| > ## +--------------------+ > ## |{"id":2,"value":3.0}| > ## | {"id":3}| > ## | {"id":1}| > ## +--------------------+ > {code} > We should set {{ignoreNullFields}} for Pandas API, to be `False` by default, > so both cases handle missing values in the same way. > {code:python} > psf.to_json(path, ignoreNullFields=False) > spark.read.text(path).show(truncate=False) > ## +---------------------+ > ## |value | > ## +---------------------+ > ## |{"id":3,"value":null}| > ## |{"id":1,"value":null}| > ## |{"id":2,"value":3.0} | > ## +---------------------+ > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org