Haejoon Lee created SPARK-43877:
-----------------------------------
Summary: Fix behavior difference for compare binary functions.
Key: SPARK-43877
URL: https://issues.apache.org/jira/browse/SPARK-43877
Project: Spark
Issue Type: Sub-task
Components: Pandas API on Spark, PySpark
Affects Versions: 3.5.0
Reporter: Haejoon Lee
In [https://github.com/apache/spark/pull/41362,] we add `result =
result.fillna(False)` for filling the gap between pandas <> pandas API on
Spark, but it should be internally fixed from Spark Connect side. Please refer
to the reproducible code below:
{code:java}
import pandas as pd
import pyspark.pandas as ps
from pyspark.sql.utils import pyspark_column_op
pser = pd.Series([None, None, None])
psser = ps.from_pandas(pser)
pyspark_column_op("__ge__")(psser, psser)
# Wrong result:
# 0 None
# 1 None
# 2 None
# dtype: object
# Expected result:
pser > pser
# 0 False
# 1 False
# 2 False
dtype: bool{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]