dgd_contributor created SPARK-36785:
---------------------------------------

             Summary: Fix ps.DataFrame.isin
                 Key: SPARK-36785
                 URL: https://issues.apache.org/jira/browse/SPARK-36785
             Project: Spark
          Issue Type: Sub-task
          Components: PySpark
    Affects Versions: 3.3.0
            Reporter: dgd_contributor


{code:python}
>>> psdf = ps.DataFrame(
...     {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 1, 
None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]},
... )
>>> 
>>> psdf
     a    b  c                                                                  
0  NaN  NaN  1
1  2.0  5.0  5
2  3.0  NaN  1
3  4.0  3.0  3
4  5.0  2.0  2
5  6.0  1.0  1
6  7.0  NaN  1
7  8.0  0.0  0
8  NaN  0.0  0
>>> other = [1, 2, None]

>>> psdf.isin(other)
      a     b     c
0  None  None  True
1  True  None  None
2  None  None  True
3  None  None  None
4  None  True  True
5  None  True  True
6  None  None  True
7  None  None  None
8  None  None  None
>>> psdf.isin(other).dtypes
a    bool
b    bool
c    bool
dtype: object
>>> psdf.to_pandas().isin(other).dtypes
a    bool
b    bool
c    bool
dtype: object
>>> psdf.to_pandas().isin(other)
       a      b      c
0  False  False   True
1   True  False  False
2  False  False   True
3  False  False  False
4  False   True   True
5  False   True   True
6  False  False   True
7  False  False  False
8  False  False  False
>>> 

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to