pchintar opened a new pull request, #4458:
URL: https://github.com/apache/datafusion-comet/pull/4458
## Which issue does this PR close?
Closes #4457 .
## Rationale for this change
Queries containing `NullType` columns could fail when
`spark.comet.exec.localTableScan.enabled=true`.
For example:
```sql
SELECT max(col) FROM VALUES
(NULL),
(NULL)
AS t(col)
````
could fail during Arrow schema conversion in `CometLocalTableScanExec`
instead of cleanly falling back to Spark execution.
## What changes are included in this PR?
This PR adds an explicit support-level check for `NullType` columns in
`CometLocalTableScanExec`.
When a `LocalTableScanExec` contains `NullType`, Comet now marks the
operator as unsupported and falls back to Spark execution instead of attempting
native Arrow conversion.
A regression test has also been added for aggregate queries over `NullType`
local table scans.
## How are these changes tested?
Added a regression test in `CometExecSuite` covering:
```sql
SELECT max(col) FROM VALUES (NULL), (NULL) AS t(col)
```
with:
```text
spark.comet.exec.localTableScan.enabled=true
```
Before the fix, the query could fail with:
```text
java.lang.UnsupportedOperationException:
Unsupported data type: [org.apache.spark.sql.types.NullType$] void
```
After the fix, the query cleanly falls back to Spark execution and
returns the expected result.
Test command used:
```bash
mvn -pl spark -DwildcardSuites=org.apache.comet.exec.CometExecSuite test
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]