[
https://issues.apache.org/jira/browse/SPARK-57484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chao Sun updated SPARK-57484:
-----------------------------
Shepherd: Chao Sun (was: Sun Chao)
> ColumnarToRowExec can crash codegen when columnar data contains unexpected
> nulls
> --------------------------------------------------------------------------------
>
> Key: SPARK-57484
> URL: https://issues.apache.org/jira/browse/SPARK-57484
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0, 3.5.8
> Reporter: Ziting Shen
> Assignee: Ziting Shen
> Priority: Major
> Labels: pull-request-available
>
> ColumnarToRowExec may materialize null values from a columnar batch even when
> the planned output attribute is non-nullable, so downstream row codegen can
> skip null checks and fail with errors such as UTF8String.getBaseObject() NPEs.
> One observed failing plan had this shape:
> HashAggregate
> HashAggregate
> ...
> ColumnarToRow
> Scan parquet ...
> In that case the Parquet reader produced a physical null for a column whose
> planned output attribute was non-nullable, and downstream generated row code
> failed with a UTF8String.getBaseObject() NullPointerException.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]