Yicong Huang created SPARK-55125:
------------------------------------
Summary: Remove redundant __init__ methods in Arrow serializers
Key: SPARK-55125
URL: https://issues.apache.org/jira/browse/SPARK-55125
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 4.2.0
Reporter: Yicong Huang
Several serializers have `__init__` methods that only pass parameters to parent
without any additional logic. These can be removed since Python inheritance
automatically calls the parent's `__init__`.
{code:python}
# CogroupArrowUDFSerializer (line 1280)
def __init__(self, assign_cols_by_name):
super().__init__(assign_cols_by_name)
# GroupArrowUDFSerializer (line 1053)
def __init__(self, assign_cols_by_name):
super().__init__(assign_cols_by_name=assign_cols_by_name)
# ArrowStreamAggArrowUDFSerializer (line 1094)
def __init__(self, timezone, safecheck, assign_cols_by_name, arrow_cast):
super().__init__(
timezone=timezone,
safecheck=safecheck,
assign_cols_by_name=assign_cols_by_name,
arrow_cast=arrow_cast,
)
{code}
Compare with `CogroupPandasUDFSerializer` which correctly omits redundant
`__init__`:
{code:python}
class CogroupPandasUDFSerializer(ArrowStreamPandasUDFSerializer):
def load_stream(self, stream): # no __init__, inherits from parent
...
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]