[ 
https://issues.apache.org/jira/browse/SPARK-57484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziting Shen updated SPARK-57484:
--------------------------------
    Description: 
ColumnarToRowExec may materialize null values from a columnar batch even when 
the planned output attribute is non-nullable, so downstream row codegen can 
skip null checks and fail with errors such as UTF8String.getBaseObject() NPEs.

One observed failing plan had this shape:

HashAggregate
  HashAggregate
    ...
      ColumnarToRow
        Scan parquet ...

In that case the Parquet reader produced a physical null for a column whose 
planned output attribute was non-nullable, and downstream generated row code 
failed with a UTF8String.getBaseObject() NullPointerException.

  was:{{ColumnarToRowExec may materialize null values from a columnar batch 
even when the planned output attribute is non-nullable, so downstream row 
codegen can skip null checks and fail with errors such as 
UTF8String.getBaseObject() NPEs.}}


> ColumnarToRowExec can crash codegen when columnar data contains unexpected 
> nulls
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-57484
>                 URL: https://issues.apache.org/jira/browse/SPARK-57484
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0, 3.5.8
>            Reporter: Ziting Shen
>            Priority: Major
>              Labels: pull-request-available
>
> ColumnarToRowExec may materialize null values from a columnar batch even when 
> the planned output attribute is non-nullable, so downstream row codegen can 
> skip null checks and fail with errors such as UTF8String.getBaseObject() NPEs.
> One observed failing plan had this shape:
> HashAggregate
>   HashAggregate
>     ...
>       ColumnarToRow
>         Scan parquet ...
> In that case the Parquet reader produced a physical null for a column whose 
> planned output attribute was non-nullable, and downstream generated row code 
> failed with a UTF8String.getBaseObject() NullPointerException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to