[ 
https://issues.apache.org/jira/browse/SPARK-55126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-55126:
-------------------------------------

    Assignee: Yicong Huang

> Remove unused timezone and assign_cols_by_name from 
> ArrowStreamArrowUDFSerializer
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-55126
>                 URL: https://issues.apache.org/jira/browse/SPARK-55126
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 4.2.0
>            Reporter: Yicong Huang
>            Assignee: Yicong Huang
>            Priority: Major
>              Labels: pull-request-available
>
> `ArrowStreamArrowUDFSerializer` stores `timezone` and `assign_cols_by_name` 
> but never uses them:
> {code:python}
> class ArrowStreamArrowUDFSerializer(ArrowStreamSerializer):
>     def __init__(self, timezone, safecheck, assign_cols_by_name, arrow_cast):
>         super().__init__()
>         self._timezone = timezone              # never used
>         self._safecheck = safecheck
>         self._assign_cols_by_name = assign_cols_by_name  # never used
>         self._arrow_cast = arrow_cast
> {code}
> Arrow serializers operate directly on Arrow arrays without pandas conversion, 
> so these parameters are unnecessary. This also affects 
> `ArrowBatchUDFSerializer` and `ArrowStreamAggArrowUDFSerializer` which pass 
> these unused parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to