[ https://issues.apache.org/jira/browse/SPARK-53029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allison Wang updated SPARK-53029: --------------------------------- Description: Address this follow up discussion [https://github.com/apache/spark/pull/51692#discussion_r2241413025] was: Address this follow up discussion [https://github.com/apache/spark/pull/51692#discussion_r2241413025] Currently: In [7]: import pyarrow as pa In [8]: @arrow_udtf(returnType="x int") ...: class MyArrowUDTF: ...: def eval(self, batch: pa.RecordBatch): ...: yield batch.column(0)In [10]: MyArrowUDTF(df.asTable()).show() will fail with pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_ARROW_TYPE_CONVERSION_ERROR] Cannot convert the output value of the input '[ 0 ]' with type 'struct<x:int>' to the specified return type of the column: 'struct<x: int32>'. Please check if the data types match and try again. However, this works in `@arrow_udf`. We should make this behavior consistent. > Support return type coercion for Arrow Python UDTFs > --------------------------------------------------- > > Key: SPARK-53029 > URL: https://issues.apache.org/jira/browse/SPARK-53029 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 4.1.0 > Reporter: Allison Wang > Priority: Major > > Address this follow up discussion > [https://github.com/apache/spark/pull/51692#discussion_r2241413025] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org