andygrove opened a new issue, #2921:
URL: https://github.com/apache/datafusion-comet/issues/2921

   ### What is the problem the feature request solves?
   
   Comet currently has two different approaches to scanning Iceberg tables. One 
approach is based on integrating with the Iceberg Java library, and the other 
is based on integrating with the Rust library. Both of these approaches are 
experimental.
   
   We are looking for feedback from the community on whether to continue with 
both approaches, or standardize on one approach.
   
   ## Implementation #1 - Integrate with Iceberg Java library
   
   Comet contributors have been contributing to the main Iceberg repository to 
integrate Comet as an optional reader. 
   
   One of the major benefits of this approach is that Comet can support some 
Spark features more easily through JVM code, such as modular encryption and 
custom authentication.
   
   There have been numerous challenges due to the fact that Comet and Iceberg 
depend on different versions of various libraries (including Parquet) and this 
has led to some workarounds involving shading and redesigning the API between 
Iceberg and Comet. This also adds a circular dependency between the Comet and 
Iceberg projects, where we need to release new Comet versions before we can 
update Iceberg to use that version. There have also been challenges in getting 
PRs merged.
   
   It is currently necessary to patch the Iceberg source and build custom 
releases.
   
   ## Implementation #2 - Integrate with Iceberg Rust library
   
   As of version 0.12.0, Comet now includes an integration with the 
iceberg-rust crate (which uses the same arrow-rs Parquet reader that Comet 
already uses). This new approach allows unmodified Iceberg Java to handle query 
planning (i.e., catalog access, partition pruning, etc.), then Comet serializes 
Iceberg FileScanTask objects directly to iceberg-rust, enabling native 
execution of Iceberg table scans through DataFusion.
   
   Parquet Modular Encryption: Encryption was recently added to the Iceberg 
spec in v3. PME is already supported in Arrow-rs Parquet reader used by Comet. 
As part of supporting the Iceberg v3 spec, Comet would need to set encryption 
properties in ArrowReaderOptions that iceberg-rust does not currently expose. 
We would need to expose the relevant fields in the ArrowReader in a way that is 
spec compliant.
   
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to