dgd_contributor created SPARK-36785:
---------------------------------------
Summary: Fix ps.DataFrame.isin
Key: SPARK-36785
URL: https://issues.apache.org/jira/browse/SPARK-36785
Project: Spark
Issue Type: Sub-task
Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor
{code:python}
>>> psdf = ps.DataFrame(
... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 1,
None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]},
... )
>>>
>>> psdf
a b c
0 NaN NaN 1
1 2.0 5.0 5
2 3.0 NaN 1
3 4.0 3.0 3
4 5.0 2.0 2
5 6.0 1.0 1
6 7.0 NaN 1
7 8.0 0.0 0
8 NaN 0.0 0
>>> other = [1, 2, None]
>>> psdf.isin(other)
a b c
0 None None True
1 True None None
2 None None True
3 None None None
4 None True True
5 None True True
6 None None True
7 None None None
8 None None None
>>> psdf.isin(other).dtypes
a bool
b bool
c bool
dtype: object
>>> psdf.to_pandas().isin(other).dtypes
a bool
b bool
c bool
dtype: object
>>> psdf.to_pandas().isin(other)
a b c
0 False False True
1 True False False
2 False False True
3 False False False
4 False True True
5 False True True
6 False False True
7 False False False
8 False False False
>>>
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]