mbutrovich commented on PR #4393:
URL:
https://github.com/apache/datafusion-comet/pull/4393#issuecomment-4605953974
## CI failure characterization (`logs_71941792087`, 2026-06-02)
**Total: 156 failures**, all confined to upstream **Spark SQL `core`
shards** (Spark 3.5.8 and 4.0.2). Clean across the board: Iceberg integration,
catalyst, hive, TPC-H/DS, Rust, lint, native builds, and all of Comet's own
exec/expressions/shuffle/scans suites. This confirms the big-blast buckets are
resolved: B1 (NullType), B2 (TimeType), B3b (Iceberg silent corruption), B4
(window-suite assertion). The ~1,200 -> 156 drop matches the incremental-fix
plan.
Failures by shard:
| Shard | Count |
|---|---|
| `core-1` Spark 3.5 | 59 |
| `core-1` Spark 4.0 | 49 |
| `core-3` Spark 4.0 | 17 |
| `core-2` Spark 3.5 | 12 |
| `core-2` Spark 4.0 | 12 |
| `core-3` Spark 3.5 | 7 |
### Buckets (mapped to the existing B-scheme)
**B5 plan-shape / explain / metrics assertions (~55, largest remaining, not
Comet defects).** Tests grep the physical plan for Spark-only nodes or check
codegen/SQL metrics Comet does not emit. Now that LocalTableScan is
Comet-native, the plan shows `CometNativeColumnarToRow` / `CometHashAggregate`
where the test expects `WholeStageCodegen`, `FilterExec`, `LocalTableScanExec`,
etc. Examples: `SPARK-19471 AggregationIterator` (8), `basic` / `SPARK-29894
Codegen Stage Id` / `SPARK-32615 SQLMetrics` (`SQLAppStatusListenerSuite`, 12),
Python UDF push-down (`BatchEvalPythonExecSuite`, 8), `join strategy hint` (4),
`ReplaceNullWithFalse` SPARK-25860/33847 (4, "is not LocalTableScanExec"),
`Support ExplainMode`, `SPARK-37371 UnionExec columnar`, `propagate empty
relation`, `repartitionByRange`, `full scan`. This is the deferred B5; needs a
skip-list or per-test `localTableScan.enabled=false`.
**B7 long tail (~62).** Several sub-groups, each a pre-existing
Comet/Spark-compat gap now exposed because LocalTableScan routes these through
Comet:
- collect_set/collect_list ordering (20): `collect functions`, `collect
functions structs`, `should be able to cast...`, `SPARK-17641`. Comet returns
`[1,3,2]` vs Spark `[1,2,3]`. Unordered-set semantics; cosmetic but fails
golden comparison.
- Math ULP diffs (18): `asinh`, `cosh`, `tan`, `cot`, `atanh`, `cbrt`,
`exp`, `pow/power`, `atan2`. JVM-libm vs Rust-libm.
- `bit_length` / `octet_length` on BinaryType (4, `SPARK-36751`):
DataFusion's string-only UDF rejects Binary input.
- Error-class / exception-wrapper mismatches (~18): Comet throws
`CometNativeException` / `SparkException` where Spark expects a specific class.
`datetime-formatting-legacy` (`IllegalArgumentException` vs Comet native),
`function to_date` / `to_unix_timestamp` (`SparkUpgradeException` expected),
`try_aggregates` / `ansi/try_aggregates` (DIVIDE_BY_ZERO error-class format),
`to_binary hex` (`CONVERSION_INVALID_INPUT`), `FAILED_EXECUTE_UDF`, `null IN
()` returning `false` not `null`.
**B3a nested-type nullability residual (~14, deferred).** Native asserts on
child-field nullability mismatch in array/map kernels. Distinct signatures:
- `spark_array_slice returned List(Int32) ... expected List(Int32, nullable:
true)` -> `slice function`
- `Type mismatch in ArrayInsert: List(Int32) vs List(Int32, nullable: true)`
-> `array_insert`, `array prepend (SPARK-41233)`
- `ListArray expected Struct("key": Int32, "value": Int32) got ... non-null`
-> `map with arrays`, `map_entries`
- plus `flatten`, `array_size`, `cardinality`. Same child-nullability
inference gap previously flagged as B3a.
**B6 subquery under `CometLocalTableScan` (~17).** `CometRuntimeException:
Subquery NNN not found for plan MMM` (8 native occurrences). Hits `SPARK-36447
With in subquery` (8), `CTE Predicate push-down and column pruning` (8),
`subquery in repartition` (1). B6 was expected to resolve transitively on B1
but has **not**; the subquery registration path still does not handle the new
root operator. Worth a closer look.
**Possible real correctness regression (2) - flag for investigation.**
`group-by-filter.sql` query #36/#37: `COUNT(a) FILTER (WHERE b <= 2)` returns
**7** but should be **6**. Unlike the rest of B7 this is a wrong aggregate
value, not a cosmetic/error-class diff. The FILTER clause appears to be ignored
or mis-evaluated through the Comet aggregate path. This is the one bucket to
treat as a genuine bug rather than a test-expectation mismatch.
### Summary
Flipping `localTableScan` to default-on routes a large swath of upstream
Spark SQL tests (UDFs, expressions, plan-inspection tests) through Comet for
the first time. The 156 remaining failures split into:
- Not Comet defects (~75): B5 plan-shape assertions + B7 cosmetic (ordering,
ULP, error-class). Resolve via skip-list / per-test config.
- Pre-existing Comet gaps newly exposed (~35): B3a nested nullability,
bit/octet on Binary. Each a separate fix.
- Real bugs to chase (~19): B6 subquery-not-found (17, did not resolve
transitively as expected) and the group-by-filter COUNT-FILTER wrong result (2).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]