phillipleblanc commented on issue #1797: URL: https://github.com/apache/iceberg-rust/issues/1797#issuecomment-3459538933
I agree with many of the issues voiced by @Sl1mb0 and @JanKaal RE: the Rust/Java abstractions and the lack of integration into DataFusion/Arrow/ObjectStore ecosystem. Similar to @colinmarc, we also maintain a fork at https://github.com/spiceai/iceberg-rust, where we mostly apply some of our own patches that we've submitted as PRs ([1297](https://github.com/apache/iceberg-rust/pull/1297), [917](https://github.com/apache/iceberg-rust/pull/917), [1673](https://github.com/apache/iceberg-rust/pull/1673)) but haven't yet been merged. Our ideal state is not using the IcebergTableProvider that is provided by this project out of the box. We went through a similar exercise with delta-rs and found the maintenance burden of coordinating DataFusion versions to delta-rs versions with our usage of DataFusion to be very difficult. We really like the approach that the [delta-kernel-rs](https://github.com/delta-io/delta-kernel-rs) team took with providing a good set of primitives that can be used during planning, which we then use to [hook into the advanced Parquet reading capabilities](https://github.com/spiceai/spiceai/blob/trunk/crates/data_components/src/delta_lake.rs#L357) that DataFusion has (i.e. ParquetExec, ParquetAccessPlan, object_store, etc). So our wishlist would be: - A "kernel" (similar to what delta-kernel does) that separates the planning from execution and makes it easy to integrate into a custom. - Allow using object_store for the kernel IO (ref: https://github.com/apache/iceberg-rust/issues/172) instead of OpenDAL, since we are already heavily invested in it. - A "reference" implementation of using the kernel (i.e. it could be IcebergTableProvider, but maybe just an example) that shows how to separate the planning of which files to read (and which rows to mask) with a deep integration into the DataFusion ParquetExec machinery. I think its fine to leave the IcebergTableProvider as a "batteries-included" provider that does everything using OpenDAL, as long as we had the primitives above. If there was appetite to take the project in more of this direction, we would definitely be interested in contributing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
