[
https://issues.apache.org/jira/browse/SPARK-55126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55126:
-----------------------------------
Labels: pull-request-available (was: )
> Remove unused timezone and assign_cols_by_name from
> ArrowStreamArrowUDFSerializer
> ---------------------------------------------------------------------------------
>
> Key: SPARK-55126
> URL: https://issues.apache.org/jira/browse/SPARK-55126
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 4.2.0
> Reporter: Yicong Huang
> Priority: Major
> Labels: pull-request-available
>
> `ArrowStreamArrowUDFSerializer` stores `timezone` and `assign_cols_by_name`
> but never uses them:
> {code:python}
> class ArrowStreamArrowUDFSerializer(ArrowStreamSerializer):
> def __init__(self, timezone, safecheck, assign_cols_by_name, arrow_cast):
> super().__init__()
> self._timezone = timezone # never used
> self._safecheck = safecheck
> self._assign_cols_by_name = assign_cols_by_name # never used
> self._arrow_cast = arrow_cast
> {code}
> Arrow serializers operate directly on Arrow arrays without pandas conversion,
> so these parameters are unnecessary. This also affects
> `ArrowBatchUDFSerializer` and `ArrowStreamAggArrowUDFSerializer` which pass
> these unused parameters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]