Yicong Huang created SPARK-55125:
------------------------------------

             Summary: Remove redundant __init__ methods in Arrow serializers
                 Key: SPARK-55125
                 URL: https://issues.apache.org/jira/browse/SPARK-55125
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 4.2.0
            Reporter: Yicong Huang


Several serializers have `__init__` methods that only pass parameters to parent 
without any additional logic. These can be removed since Python inheritance 
automatically calls the parent's `__init__`.

{code:python}
# CogroupArrowUDFSerializer (line 1280)
def __init__(self, assign_cols_by_name):
    super().__init__(assign_cols_by_name)

# GroupArrowUDFSerializer (line 1053)
def __init__(self, assign_cols_by_name):
    super().__init__(assign_cols_by_name=assign_cols_by_name)

# ArrowStreamAggArrowUDFSerializer (line 1094)
def __init__(self, timezone, safecheck, assign_cols_by_name, arrow_cast):
    super().__init__(
        timezone=timezone,
        safecheck=safecheck,
        assign_cols_by_name=assign_cols_by_name,
        arrow_cast=arrow_cast,
    )
{code}

Compare with `CogroupPandasUDFSerializer` which correctly omits redundant 
`__init__`:

{code:python}
class CogroupPandasUDFSerializer(ArrowStreamPandasUDFSerializer):
    def load_stream(self, stream):  # no __init__, inherits from parent
        ...
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to