Yicong Huang created SPARK-55506:
------------------------------------

             Summary: Pass explicit input_type (Spark schema) to 
CogroupPandasUDFSerializer
                 Key: SPARK-55506
                 URL: https://issues.apache.org/jira/browse/SPARK-55506
             Project: Spark
          Issue Type: Sub-task
          Components: PySpark
    Affects Versions: 4.2.0
            Reporter: Yicong Huang


Currently CogroupPandasUDFSerializer is constructed without input_type in 
worker.py, so _input_type defaults to None. This means 
ArrowBatchTransformer.to_pandas cannot use the Spark schema for type-aware 
Arrow-to-Pandas conversion in cogroup UDFs.

We should explicitly build and pass the Spark schema from worker.py, consistent 
with how other serializers (e.g. ArrowStreamPandasUDFSerializer, 
ApplyInPandasWithStateSerializer) receive their input_type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to