[ 
https://issues.apache.org/jira/browse/SPARK-55723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Huang updated SPARK-55723:
---------------------------------
    Description: 
The `enforce_schema` method in `ArrowBatchTransformer` raises 
`RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF` when Arrow type casting fails. This 
error class is UDTF-specific, but `enforce_schema` is a general-purpose utility 
that will be shared across other Arrow-based UDF types (e.g., scalar Arrow 
UDFs).

We should introduce a more general error class (e.g., 
`RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF`) and use it in `enforce_schema`, so the 
error message is appropriate regardless of the calling UDF type.

See: https://github.com/apache/spark/pull/54296#discussion_r2861772381

  was:
Currently, the enforce_schema method in ArrowTableConversions uses 
UDTF-specific error messages (e.g., RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF) 
when casting fails. As we consolidate more Arrow UDF code paths to share this 
function, the error messages should be generalized to be user-friendly for all 
UDF types, not just UDTF.

This is a follow-up from 
https://github.com/apache/spark/pull/54296#discussion_r2861772381.

        Summary: Generalize RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF error in 
enforce_schema for all UDF types  (was: Unify UDTF error messages with other 
UDFs in Arrow-based execution)

> Generalize RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF error in enforce_schema for 
> all UDF types
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-55723
>                 URL: https://issues.apache.org/jira/browse/SPARK-55723
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 4.1.2
>            Reporter: Yicong Huang
>            Priority: Minor
>
> The `enforce_schema` method in `ArrowBatchTransformer` raises 
> `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF` when Arrow type casting fails. This 
> error class is UDTF-specific, but `enforce_schema` is a general-purpose 
> utility that will be shared across other Arrow-based UDF types (e.g., scalar 
> Arrow UDFs).
> We should introduce a more general error class (e.g., 
> `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF`) and use it in `enforce_schema`, so 
> the error message is appropriate regardless of the calling UDF type.
> See: https://github.com/apache/spark/pull/54296#discussion_r2861772381



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to