adityavaish commented on PR #4669: URL: https://github.com/apache/datafusion-comet/pull/4669#issuecomment-4735610375
@schenksj would you be willing to review this and share your thoughts? You have the most context on native Delta in Comet (the delta-kernel-rs scan in #3932 and the contrib restructure in #4366), so your perspective would be very valuable. This PR is deliberately a much smaller, complementary approach to yours: behind `spark.comet.scan.delta.enabled` (default off), it just reuses Comet's existing native Parquet reader for **plain** Delta tables (no deletion vectors, no column mapping), and conservatively falls back to Spark otherwise. It's essentially a revival of #3035. It is **not** meant to replace the full delta-kernel integration — rather to give the common case a native path today while the contrib module matures. A few specific things I'd love your opinion on: 1. Does an in-core, flag-gated Tier-0 like this make sense alongside the contrib `delta-kernel` work, or would you prefer all Delta support live in `contrib/`? 2. The fallback guards: column mapping is detected via `DeltaParquetFileFormat.columnMappingMode` (reflection) and deletion vectors via the synthetic `__delta_internal_*` scan columns. Are there other Delta features that would make a plain native Parquet read incorrect that I should also guard against? 3. The Delta test dependency is wired per Spark profile (delta-core 2.4.0 for 3.4; delta-spark 3.3.2/4.0.0/4.1.0/4.2.0 for 3.5/4.0/4.1/4.2). Does that match how you handle versioning in the contrib suite? Thanks for taking a look! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
