Yicong Huang created SPARK-55126:
------------------------------------
Summary: Remove unused timezone and assign_cols_by_name from
ArrowStreamArrowUDFSerializer
Key: SPARK-55126
URL: https://issues.apache.org/jira/browse/SPARK-55126
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 4.2.0
Reporter: Yicong Huang
`ArrowStreamArrowUDFSerializer` stores `timezone` and `assign_cols_by_name` but
never uses them:
{code:python}
class ArrowStreamArrowUDFSerializer(ArrowStreamSerializer):
def __init__(self, timezone, safecheck, assign_cols_by_name, arrow_cast):
super().__init__()
self._timezone = timezone # never used
self._safecheck = safecheck
self._assign_cols_by_name = assign_cols_by_name # never used
self._arrow_cast = arrow_cast
{code}
Arrow serializers operate directly on Arrow arrays without pandas conversion,
so these parameters are unnecessary. This also affects
`ArrowBatchUDFSerializer` and `ArrowStreamAggArrowUDFSerializer` which pass
these unused parameters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]