Re: [PR] feat: enable CometLocalTableScanExec by default [datafusion-comet]

via GitHub Tue, 02 Jun 2026 11:42:47 -0700


mbutrovich commented on PR #4393:
URL: 
https://github.com/apache/datafusion-comet/pull/4393#issuecomment-4605953974


   ## CI failure characterization (`logs_71941792087`, 2026-06-02)
   
   **Total: 156 failures**, all confined to upstream **Spark SQL `core` 
shards** (Spark 3.5.8 and 4.0.2). Clean across the board: Iceberg integration, 
catalyst, hive, TPC-H/DS, Rust, lint, native builds, and all of Comet's own 
exec/expressions/shuffle/scans suites. This confirms the big-blast buckets are 
resolved: B1 (NullType), B2 (TimeType), B3b (Iceberg silent corruption), B4 
(window-suite assertion). The ~1,200 -> 156 drop matches the incremental-fix 
plan.
   
   Failures by shard:
   
   | Shard | Count |
   |---|---|
   | `core-1` Spark 3.5 | 59 |
   | `core-1` Spark 4.0 | 49 |
   | `core-3` Spark 4.0 | 17 |
   | `core-2` Spark 3.5 | 12 |
   | `core-2` Spark 4.0 | 12 |
   | `core-3` Spark 3.5 | 7 |
   
   ### Buckets (mapped to the existing B-scheme)
   
   **B5 plan-shape / explain / metrics assertions (~55, largest remaining, not 
Comet defects).** Tests grep the physical plan for Spark-only nodes or check 
codegen/SQL metrics Comet does not emit. Now that LocalTableScan is 
Comet-native, the plan shows `CometNativeColumnarToRow` / `CometHashAggregate` 
where the test expects `WholeStageCodegen`, `FilterExec`, `LocalTableScanExec`, 
etc. Examples: `SPARK-19471 AggregationIterator` (8), `basic` / `SPARK-29894 
Codegen Stage Id` / `SPARK-32615 SQLMetrics` (`SQLAppStatusListenerSuite`, 12), 
Python UDF push-down (`BatchEvalPythonExecSuite`, 8), `join strategy hint` (4), 
`ReplaceNullWithFalse` SPARK-25860/33847 (4, "is not LocalTableScanExec"), 
`Support ExplainMode`, `SPARK-37371 UnionExec columnar`, `propagate empty 
relation`, `repartitionByRange`, `full scan`. This is the deferred B5; needs a 
skip-list or per-test `localTableScan.enabled=false`.
   
   **B7 long tail (~62).** Several sub-groups, each a pre-existing 
Comet/Spark-compat gap now exposed because LocalTableScan routes these through 
Comet:
   
   - collect_set/collect_list ordering (20): `collect functions`, `collect 
functions structs`, `should be able to cast...`, `SPARK-17641`. Comet returns 
`[1,3,2]` vs Spark `[1,2,3]`. Unordered-set semantics; cosmetic but fails 
golden comparison.
   - Math ULP diffs (18): `asinh`, `cosh`, `tan`, `cot`, `atanh`, `cbrt`, 
`exp`, `pow/power`, `atan2`. JVM-libm vs Rust-libm.
   - `bit_length` / `octet_length` on BinaryType (4, `SPARK-36751`): 
DataFusion's string-only UDF rejects Binary input.
   - Error-class / exception-wrapper mismatches (~18): Comet throws 
`CometNativeException` / `SparkException` where Spark expects a specific class. 
`datetime-formatting-legacy` (`IllegalArgumentException` vs Comet native), 
`function to_date` / `to_unix_timestamp` (`SparkUpgradeException` expected), 
`try_aggregates` / `ansi/try_aggregates` (DIVIDE_BY_ZERO error-class format), 
`to_binary hex` (`CONVERSION_INVALID_INPUT`), `FAILED_EXECUTE_UDF`, `null IN 
()` returning `false` not `null`.
   
   **B3a nested-type nullability residual (~14, deferred).** Native asserts on 
child-field nullability mismatch in array/map kernels. Distinct signatures:
   
   - `spark_array_slice returned List(Int32) ... expected List(Int32, nullable: 
true)` -> `slice function`
   - `Type mismatch in ArrayInsert: List(Int32) vs List(Int32, nullable: true)` 
-> `array_insert`, `array prepend (SPARK-41233)`
   - `ListArray expected Struct("key": Int32, "value": Int32) got ... non-null` 
-> `map with arrays`, `map_entries`
   - plus `flatten`, `array_size`, `cardinality`. Same child-nullability 
inference gap previously flagged as B3a.
   
   **B6 subquery under `CometLocalTableScan` (~17).** `CometRuntimeException: 
Subquery NNN not found for plan MMM` (8 native occurrences). Hits `SPARK-36447 
With in subquery` (8), `CTE Predicate push-down and column pruning` (8), 
`subquery in repartition` (1). B6 was expected to resolve transitively on B1 
but has **not**; the subquery registration path still does not handle the new 
root operator. Worth a closer look.
   
   **Possible real correctness regression (2) - flag for investigation.** 
`group-by-filter.sql` query #36/#37: `COUNT(a) FILTER (WHERE b <= 2)` returns 
**7** but should be **6**. Unlike the rest of B7 this is a wrong aggregate 
value, not a cosmetic/error-class diff. The FILTER clause appears to be ignored 
or mis-evaluated through the Comet aggregate path. This is the one bucket to 
treat as a genuine bug rather than a test-expectation mismatch.
   
   ### Summary
   
   Flipping `localTableScan` to default-on routes a large swath of upstream 
Spark SQL tests (UDFs, expressions, plan-inspection tests) through Comet for 
the first time. The 156 remaining failures split into:
   
   - Not Comet defects (~75): B5 plan-shape assertions + B7 cosmetic (ordering, 
ULP, error-class). Resolve via skip-list / per-test config.
   - Pre-existing Comet gaps newly exposed (~35): B3a nested nullability, 
bit/octet on Binary. Each a separate fix.
   - Real bugs to chase (~19): B6 subquery-not-found (17, did not resolve 
transitively as expected) and the group-by-filter COUNT-FILTER wrong result (2).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: enable CometLocalTableScanExec by default [datafusion-comet]

Reply via email to