[
https://issues.apache.org/jira/browse/SPARK-41815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653643#comment-17653643
]
Martin Grund commented on SPARK-41815:
--------------------------------------
The reason seems to be that when Spark Connect serializes the data to Arrow
it's serialized as `null` and when converted to pandas it will convert this to
the `np.nan` instead of `None`. It seems that we should manually convert `nan`
to `None` for the `Row` type.
> Column.isNull returns nan instead of None
> -----------------------------------------
>
> Key: SPARK-41815
> URL: https://issues.apache.org/jira/browse/SPARK-41815
> Project: Spark
> Issue Type: Sub-task
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: Hyukjin Kwon
> Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 99, in
> pyspark.sql.connect.column.Column.isNull
> Failed example:
> df.filter(df.height.isNull()).collect()
> Expected:
> [Row(name='Alice', height=None)]
> Got:
> [Row(name='Alice', height=nan)]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]