Yicong Huang created SPARK-55506:
------------------------------------
Summary: Pass explicit input_type (Spark schema) to
CogroupPandasUDFSerializer
Key: SPARK-55506
URL: https://issues.apache.org/jira/browse/SPARK-55506
Project: Spark
Issue Type: Sub-task
Components: PySpark
Affects Versions: 4.2.0
Reporter: Yicong Huang
Currently CogroupPandasUDFSerializer is constructed without input_type in
worker.py, so _input_type defaults to None. This means
ArrowBatchTransformer.to_pandas cannot use the Spark schema for type-aware
Arrow-to-Pandas conversion in cogroup UDFs.
We should explicitly build and pass the Spark schema from worker.py, consistent
with how other serializers (e.g. ArrowStreamPandasUDFSerializer,
ApplyInPandasWithStateSerializer) receive their input_type.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]