[jira] [Assigned] (SPARK-56929) Pass prefers_large_types when building expected schema for Arrow grouped/cogrouped map UDFs

Ruifeng Zheng (Jira) Mon, 18 May 2026 18:33:10 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-56929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ruifeng Zheng reassigned SPARK-56929:
-------------------------------------

    Assignee: Yicong Huang

> Pass prefers_large_types when building expected schema for Arrow 
> grouped/cogrouped map UDFs
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-56929
>                 URL: https://issues.apache.org/jira/browse/SPARK-56929
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 5.0.0
>            Reporter: Yicong Huang
>            Assignee: Yicong Huang
>            Priority: Major
>              Labels: pull-request-available
>
> *Bug*: In {{python/pyspark/worker.py}}, the per-field 
> {{expected_cols_and_types}} schema built for the three Arrow map eval types
> * {{SQL_GROUPED_MAP_ARROW_UDF}}
> * {{SQL_GROUPED_MAP_ARROW_ITER_UDF}}
> * {{SQL_COGROUPED_MAP_ARROW_UDF}}
> calls {{to_arrow_type(col.dataType, timezone="UTC")}} per field without 
> forwarding {{prefers_large_types=runner_conf.use_large_var_types}}, while the 
> corresponding {{arrow_return_type}} (used to derive the actual result schema) 
> *is* built with the flag.
> *Effect*: When {{spark.sql.execution.arrow.useLargeVarTypes=true}}, fields of 
> type {{StringType}}/{{BinaryType}} are produced as Arrow 
> {{large_string}}/{{large_binary}} in the result table but expected as regular 
> {{string}}/{{binary}}. {{verify_arrow_result}} then raises a spurious 
> {{RESULT_COLUMN_TYPES_MISMATCH}}.
> *Fix*: Pass {{prefers_large_types=runner_conf.use_large_var_types}} when 
> constructing {{expected_cols_and_types}} in all three sites, matching the 
> arrow_return_type construction immediately above.
> This is a pre-requisite for SPARK-56608 (Migrate verify_arrow_result checks 
> into enforce_schema).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-56929) Pass prefers_large_types when building expected schema for Arrow grouped/cogrouped map UDFs

Reply via email to