phillipleblanc commented on issue #1797:
URL: https://github.com/apache/iceberg-rust/issues/1797#issuecomment-3459538933

   I agree with many of the issues voiced by @Sl1mb0 and @JanKaal RE: the 
Rust/Java abstractions and the lack of integration into 
DataFusion/Arrow/ObjectStore ecosystem. Similar to @colinmarc, we also maintain 
a fork at https://github.com/spiceai/iceberg-rust, where we mostly apply some 
of our own patches that we've submitted as PRs 
([1297](https://github.com/apache/iceberg-rust/pull/1297), 
[917](https://github.com/apache/iceberg-rust/pull/917), 
[1673](https://github.com/apache/iceberg-rust/pull/1673)) but haven't yet been 
merged.
    
   Our ideal state is not using the IcebergTableProvider that is provided by 
this project out of the box. We went through a similar exercise with delta-rs 
and found the maintenance burden of coordinating DataFusion versions to 
delta-rs versions with our usage of DataFusion to be very difficult. We really 
like the approach that the 
[delta-kernel-rs](https://github.com/delta-io/delta-kernel-rs) team took with 
providing a good set of primitives that can be used during planning, which we 
then use to [hook into the advanced Parquet reading 
capabilities](https://github.com/spiceai/spiceai/blob/trunk/crates/data_components/src/delta_lake.rs#L357)
 that DataFusion has (i.e. ParquetExec, ParquetAccessPlan, object_store, etc).
    
   So our wishlist would be:
   - A "kernel" (similar to what delta-kernel does) that separates the planning 
from execution and makes it easy to integrate into a custom.
   - Allow using object_store for the kernel IO (ref: 
https://github.com/apache/iceberg-rust/issues/172) instead of OpenDAL, since we 
are already heavily invested in it.
   - A "reference" implementation of using the kernel (i.e. it could be 
IcebergTableProvider, but maybe just an example) that shows how to separate the 
planning of which files to read (and which rows to mask) with a deep 
integration into the DataFusion ParquetExec machinery. I think its fine to 
leave the IcebergTableProvider as a "batteries-included" provider that does 
everything using OpenDAL, as long as we had the primitives above.
   
   If there was appetite to take the project in more of this direction, we 
would definitely be interested in contributing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to