Yicong Huang created SPARK-55126:
------------------------------------

             Summary: Remove unused timezone and assign_cols_by_name from 
ArrowStreamArrowUDFSerializer
                 Key: SPARK-55126
                 URL: https://issues.apache.org/jira/browse/SPARK-55126
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 4.2.0
            Reporter: Yicong Huang


`ArrowStreamArrowUDFSerializer` stores `timezone` and `assign_cols_by_name` but 
never uses them:

{code:python}
class ArrowStreamArrowUDFSerializer(ArrowStreamSerializer):
    def __init__(self, timezone, safecheck, assign_cols_by_name, arrow_cast):
        super().__init__()
        self._timezone = timezone              # never used
        self._safecheck = safecheck
        self._assign_cols_by_name = assign_cols_by_name  # never used
        self._arrow_cast = arrow_cast
{code}

Arrow serializers operate directly on Arrow arrays without pandas conversion, 
so these parameters are unnecessary. This also affects 
`ArrowBatchUDFSerializer` and `ArrowStreamAggArrowUDFSerializer` which pass 
these unused parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to