schenksj opened a new pull request, #4700: URL: https://github.com/apache/datafusion-comet/pull/4700
**Part 1 of the Delta Lake contrib PR breakup.** The native Delta scan work (delta-kernel-rs, Iceberg-style contrib) was first posted as a single ~27k-line tracking PR, #4366, which is impractical to review as one unit. This is the first of a sequence of small, independently-reviewable, independently-mergeable PRs that reconstruct that work. The full sequence and its dependency graph live in #4366. This first slice touches **core only**. It adds a small extension contract that lets out-of-tree Comet contrib leaf scans (Delta now, Hudi and others later) take part in native planning **without core holding a compile-time reference to them**. This is the same "the edge keeps the source-specific code" shape Iceberg already uses. It ships **no Delta code** and is inert on default builds. ## Changes - **`trait CometScanWithPlanData`** — `sourceKey` / `commonData` / `perPartitionData`, plus optional `dynamicPruningFilters` / `withDynamicPruningFilters` (for scans whose DPP filters live in a `@transient` field that `TreeNode.makeCopy` cannot carry, #3510). `CometNativeScanExec` mixes it in. - **`foreachUntilCometInput`** now matches `case _: CometLeafExec`. This is a strict superset of the previous fixed scan list: the three leaf scans it replaces (`CometNativeScanExec`, `CometIcebergNativeScanExec`, `CometCsvNativeScanExec`) are exactly the classes that extend `CometLeafExec`. - **`PlanDataInjector.findAllPlanData`** collects per-partition planning data via the trait instead of a hardcoded `CometNativeScanExec` match. - **`PlanDataInjector` registry** gains one reflective `DeltaPlanDataInjector$` slot, appended to the existing `injectorsByKind` registry (#4535) **only** when a contrib bundled the class (`-Pcontrib-delta`). Default builds get `ClassNotFoundException -> None` and an unchanged registry. A class that is present but fails to bind (a misbuilt contrib jar) is logged, not silently swallowed. - **`CometPlanAdaptiveDynamicPruningFilters`** rewrites AQE DPP filters in place for trait scans whose filters cannot survive `makeCopy`. ## What this part deliberately does NOT do yet - **No `perPartitionFilePaths` on the trait.** That member only feeds `FAILED_READ_FILE` error conversion and lands in a later part, after #4536 (now merged). - **No Delta code.** There is no `DeltaPlanDataInjector` on the classpath yet, so the reflective slot resolves to nothing. This part is inert. ## Why it is safe on default builds With no contrib on the classpath the change is behavior-preserving. The leaf match is a proven superset of the old enumeration. The trait match catches the same `CometNativeScanExec` and still drives its subquery resolution. The reflective slot resolves to `None`. And the new DPP arm never fires because `CometNativeScanExec` leaves `dynamicPruningFilters` empty. ## Verification - `CometScanWithPlanDataSuite` (new): trait-contract defaults plus reflective-slot graceful absence. 2/2. - `CometJoinSuite` (native scan fusion and the DPP path): 28/28. - spotless and scalastyle: clean. - No native changes in this part. ## Roadmap This is part 1 of the breakup. Subsequent parts add the build gate and inert wiring, the Rust planning and read path, the Scala claim/decline and execution, Change Data Feed reads, the test battery, and docs. Each later part is gated behind `-Pcontrib-delta`, so every intermediate state on `main` is safe for default builds. Tracking umbrella: #4366. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
