andygrove opened a new issue, #2921: URL: https://github.com/apache/datafusion-comet/issues/2921
### What is the problem the feature request solves? Comet currently has two different approaches to scanning Iceberg tables. One approach is based on integrating with the Iceberg Java library, and the other is based on integrating with the Rust library. Both of these approaches are experimental. We are looking for feedback from the community on whether to continue with both approaches, or standardize on one approach. ## Implementation #1 - Integrate with Iceberg Java library Comet contributors have been contributing to the main Iceberg repository to integrate Comet as an optional reader. One of the major benefits of this approach is that Comet can support some Spark features more easily through JVM code, such as modular encryption and custom authentication. There have been numerous challenges due to the fact that Comet and Iceberg depend on different versions of various libraries (including Parquet) and this has led to some workarounds involving shading and redesigning the API between Iceberg and Comet. This also adds a circular dependency between the Comet and Iceberg projects, where we need to release new Comet versions before we can update Iceberg to use that version. There have also been challenges in getting PRs merged. It is currently necessary to patch the Iceberg source and build custom releases. ## Implementation #2 - Integrate with Iceberg Rust library As of version 0.12.0, Comet now includes an integration with the iceberg-rust crate (which uses the same arrow-rs Parquet reader that Comet already uses). This new approach allows unmodified Iceberg Java to handle query planning (i.e., catalog access, partition pruning, etc.), then Comet serializes Iceberg FileScanTask objects directly to iceberg-rust, enabling native execution of Iceberg table scans through DataFusion. Parquet Modular Encryption: Encryption was recently added to the Iceberg spec in v3. PME is already supported in Arrow-rs Parquet reader used by Comet. As part of supporting the Iceberg v3 spec, Comet would need to set encryption properties in ArrowReaderOptions that iceberg-rust does not currently expose. We would need to expose the relevant fields in the ArrowReader in a way that is spec compliant. ### Describe the potential solution _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
