andygrove opened a new issue, #3300:
URL: https://github.com/apache/datafusion-comet/issues/3300
## Summary
When enabling native columnar-to-row conversion by default (PR #3299),
several Spark SQL tests fail across all supported Spark versions (3.4, 3.5,
4.0). The failures fall into distinct categories that need to be addressed
before this feature can be enabled by default.
## Related PR
- #3299 - Enable native columnar-to-row by default
## Test Failures by Category
### 1. SparkPlanSuite - ColumnarToRowExec Canonicalization
**Affected Spark versions:** 3.4, 3.5, 4.0
**Failed test:**
- `SPARK-37779: ColumnarToRowExec should be canonicalizable after being
(de)serialized`
**Error:**
```
java.util.NoSuchElementException: None.get
at
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$27(SparkPlanSuite.scala:135)
```
**Root cause:** The test expects a `ColumnarToRowExec` node in the plan, but
`CometNativeColumnarToRow` is being used instead. The test tries to find and
work with the standard Spark node which doesn't exist.
---
### 2. WholeStageCodegenSuite (Spark 4.0 only)
**Affected Spark versions:** 4.0
**Failed tests:**
- `Sort should be included in WholeStageCodegen`
- `Skip splitting consume function when parameter number exceeds JVM limit`
- `SPARK-26572: evaluate non-deterministic expressions for aggregate results`
**Root cause:** Plan structure changes due to `CometNativeColumnarToRow`
insertion are affecting WholeStageCodegen behavior verification tests.
---
### 3. BucketedReadWithoutHiveSupportSuite
**Affected Spark versions:** 3.4, 3.5, 4.0
**Failed tests:**
- `SPARK-29655 Read bucketed tables obeys spark.sql.shuffle.partitions`
- `SPARK-32767 Bucket join should work if SHUFFLE_PARTITIONS larger than
bucket number`
- `bucket coalescing eliminates shuffle`
**Error:**
```
expected SortMergeJoinExec, but found
CometNativeColumnarToRow
+- CometSortMergeJoin [i#15707, j#15708], [i#15713, j#15714], Inner
:- CometSort ...
```
**Root cause:** Tests use pattern matching to find `SortMergeJoinExec` nodes
in the plan, but now find `CometNativeColumnarToRow + CometSortMergeJoin`
instead.
---
### 4. BucketedReadWithHiveSupportSuite
**Affected Spark versions:** 3.4, 3.5
**Failed tests:** Same 3 tests as BucketedReadWithoutHiveSupportSuite
**Root cause:** Same as above - plan structure assertions expect
Spark-native operators.
---
## Summary of Failed Jobs
| Job | Spark Version | Failed Suite |
|-----|---------------|--------------|
| spark-sql-auto-sql_core-1 | 3.4.3 | SparkPlanSuite |
| spark-sql-auto-sql_core-1 | 3.5.7 | SparkPlanSuite |
| spark-sql-auto-sql_core-1 | 4.0.1 | SparkPlanSuite, WholeStageCodegenSuite
|
| spark-sql-auto-sql_core-3 | 3.4.3 | BucketedReadWithoutHiveSupportSuite |
| spark-sql-auto-sql_core-3 | 3.5.7 | BucketedReadWithoutHiveSupportSuite |
| spark-sql-auto-sql_core-3 | 4.0.1 | BucketedReadWithoutHiveSupportSuite |
| spark-sql-auto-sql_hive-1 | 3.4.3 | BucketedReadWithHiveSupportSuite |
| spark-sql-auto-sql_hive-1 | 3.5.7 | BucketedReadWithHiveSupportSuite |
| spark-sql-native_iceberg_compat-sql_core-1 | 3.5.7 | SparkPlanSuite |
| spark-sql-native_iceberg_compat-sql_core-3 | 3.5.7 |
BucketedReadWithoutHiveSupportSuite |
| spark-sql-native_iceberg_compat-sql_hive-1 | 3.5.7 |
BucketedReadWithHiveSupportSuite |
## Potential Solutions
1. **For SparkPlanSuite/WholeStageCodegenSuite failures:**
- Disable native C2R for these specific test suites
- Or modify the tests to account for Comet-replaced operators
2. **For BucketedReadSuite failures:**
- The tests could be updated to recognize `CometSortMergeJoin` as a valid
replacement for `SortMergeJoinExec`
- Or disable native C2R when running these specific tests
- Or use a test helper that unwraps Comet operators when checking plan
structure
## Notes
- The `native_comet` config runs pass all tests, suggesting this is specific
to the `auto` (extended) mode and `native_iceberg_compat` mode
- All failures are related to plan structure assertions that expect vanilla
Spark operators rather than their Comet equivalents
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]