parthchandra commented on PR #4339: URL: https://github.com/apache/datafusion-comet/pull/4339#issuecomment-4480202130
Thank you @schenksj for putting in this effort. Really thorough work here! However, I think this may be solving a harder problem than the one we actually need to solve right now.. In [#3932](https://github.com/apache/datafusion-comet/pull/3932#issuecomment-4296101740), I suggested putting the Delta code in a `contrib` directory -- conditionally compiled, not included by default, with fewer guarantees for end users. That's primarily an organizational and build concern: keep the Delta code isolated so it doesn't become a maintenance burden for core committers who don't have Delta expertise. This PR takes that idea significantly further with a generic extension framework. I understand the appeal of building for the general case, but I'm not sure we need this level of abstraction yet. The biggest reason for thinking so is that the API surface is substantial and we do not have enough concrete implementations to nail the API down. This PR introduces a stability commitment once it is merged - `CometScanRuleExtension` has 6 methods, `CometOperatorSerdeExtension` has 5, `ContribOperatorPlanner` has 2, `ContribPlannerContext` has 5, `ParquetDatasourceParams` has 15 fields. This is a lot to have to finalize. I think we can get the isolation we need by taking #3932's approach (which mirrors how Iceberg works today) and reorganizing it into a `contrib/delta/` directory with conditional compilation. No SPI framework is needed, instead we can have some core touchpoints that are behind a feature flag. *Directory structure:* ``` contrib/delta/ native/ # Rust: delta_kernel integration, DV filter, column mapping spark/ # Scala: CometDeltaNativeScanExec, DeltaReflection, serde proto/ # DeltaScan proto messages (or in the main proto, gated) tests/ # Delta-specific test suites ``` *Core touchpoints (feature-gated):* - `operator.proto`: `DeltaScan` as a typed variant (like `IcebergScan` at field 112) -- compile-time type safety, better debuggability than opaque `ContribOp { kind, payload }` - `planner.rs`: one `#[cfg(feature = "delta")] OpStruct::DeltaScan => { ... }` match arm (~10 lines of dispatch to a function in the delta crate) - `CometScanRule.scala`: Delta table detection behind `COMET_DELTA_NATIVE_ENABLED` config check (~20 lines, same pattern as the existing Iceberg block at lines 304-430) - `CometExecRule.scala`: one match arm for `CometDeltaNativeScanExec` (~6 lines) *Build:* - Rust: `#[cfg(feature = "delta")]` on the match arms + `delta_kernel` dependency - Scala: Maven profile `-Pcontrib-delta` (like `-Piceberg` today, or the `-Pcontrib-example` profile already in this PR) - **Not** compiled or shipped by default **Total core coupling:** ~40 lines behind feature gates, following the same pattern as Iceberg. No new abstractions, no new traits, no runtime dispatch. Iceberg already works this way in Comet -- typed `IcebergScan` proto variant, direct match arm in `planner.rs`, direct handling in `CometScanRule`. It's simple and it works. Delta can follow the same pattern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
