Haejoon Lee created SPARK-43877:
-----------------------------------

             Summary: Fix behavior difference for compare binary functions.
                 Key: SPARK-43877
                 URL: https://issues.apache.org/jira/browse/SPARK-43877
             Project: Spark
          Issue Type: Sub-task
          Components: Pandas API on Spark, PySpark
    Affects Versions: 3.5.0
            Reporter: Haejoon Lee


In [https://github.com/apache/spark/pull/41362,] we add `result = 
result.fillna(False)` for filling the gap between pandas <> pandas API on 
Spark, but it should be internally fixed from Spark Connect side. Please refer 
to the reproducible code below:

 
{code:java}
import pandas as pd
import pyspark.pandas as ps
from pyspark.sql.utils import pyspark_column_op

pser = pd.Series([None, None, None])
psser = ps.from_pandas(pser)
pyspark_column_op("__ge__")(psser, psser)
# Wrong result:
#  0    None
#  1    None
#  2    None
#  dtype: object

# Expected result:
pser > pser
#  0    False
#  1    False
#  2    False
dtype: bool{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to