[
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533699#comment-17533699
]
Hyunwoo Park commented on SPARK-38961:
--------------------------------------
How about this way?
{code:python}
from inspect import getmembers, isclass, isfunction
import pandas as pd
from pyspark import pandas as ps
# automatically generated pyspark.pandas APIs
ps_classes = tuple(map(lambda x: x[0], getmembers(ps, isclass)))
for ps_class in ps_classes:
for method, _ in getmembers(getattr(ps, ps_class), isfunction):
print(f"{ps_class}.{method}")
# also it is possible to automatically create a missing list
common_classes = set(map(lambda x: x[0], getmembers(pd, isclass))) & \
set(map(lambda x: x[0], getmembers(ps, isclass)))
print(common_classes)
# {'Series', 'DataFrame', 'MultiIndex', 'DatetimeIndex', 'NamedAgg', 'Index',
'Int64Index', 'TimedeltaIndex', 'CategoricalIndex', 'Float64Index'}
for _class in common_classes:
not_implemented = set(
map(lambda x: x[0], getmembers(getattr(pd, _class), isfunction))
) - set(
map(lambda x: x[0], getmembers(getattr(ps, _class), isfunction))
)
print(f"class: {_class}")
print(f"not_implemented: {not_implemented}")
{code}
> Enhance to automatically generate the pandas API support list
> -------------------------------------------------------------
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
> Issue Type: Test
> Components: PySpark
> Affects Versions: 3.4.0
> Reporter: Haejoon Lee
> Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would
> be better to make the list automatically generated to reduce the maintenance
> cost.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]