[
https://issues.apache.org/jira/browse/SPARK-55125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55125:
-----------------------------------
Labels: pull-request-available (was: )
> Remove redundant __init__ methods in Arrow serializers
> ------------------------------------------------------
>
> Key: SPARK-55125
> URL: https://issues.apache.org/jira/browse/SPARK-55125
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 4.2.0
> Reporter: Yicong Huang
> Priority: Major
> Labels: pull-request-available
>
> Several serializers have `__init__` methods that only pass parameters to
> parent without any additional logic. These can be removed since Python
> inheritance automatically calls the parent's `__init__`.
> {code:python}
> # CogroupArrowUDFSerializer (line 1280)
> def __init__(self, assign_cols_by_name):
> super().__init__(assign_cols_by_name)
> # GroupArrowUDFSerializer (line 1053)
> def __init__(self, assign_cols_by_name):
> super().__init__(assign_cols_by_name=assign_cols_by_name)
> # ArrowStreamAggArrowUDFSerializer (line 1094)
> def __init__(self, timezone, safecheck, assign_cols_by_name, arrow_cast):
> super().__init__(
> timezone=timezone,
> safecheck=safecheck,
> assign_cols_by_name=assign_cols_by_name,
> arrow_cast=arrow_cast,
> )
> {code}
> Compare with `CogroupPandasUDFSerializer` which correctly omits redundant
> `__init__`:
> {code:python}
> class CogroupPandasUDFSerializer(ArrowStreamPandasUDFSerializer):
> def load_stream(self, stream): # no __init__, inherits from parent
> ...
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]