It's possible we could wrap Iceberg et al. in Flight SQL to provide this, exposing Iceberg metadata via the Flight SQL endpoints, and table reads via Substrait plans. (Clients could send Substrait plans through ADBC, and we could integrate ADBC as a type of dataset.) I'm not familiar enough with Iceberg to know if having just the core libraries is enough, or if we need an attached query engine (like Spark) to support all of the features (like row-level updates/deletes).
On Mon, Oct 3, 2022, at 11:25, Antoine Pitrou wrote: > Hi all, > > Le 03/10/2022 à 17:03, Will Jones a écrit : >> Hi Rusty, >> >> Note we discussed Iceberg a while ago [1]. I don't think we've discussed >> Hudi in any depth. >> >> As I see it, we are waiting on three things: >> >> 1. Someone willing to move forward the Iceberg / Hudi integration. >> 2. The Iceberg and Hudi projects need native libraries that we can use. The >> base implementations are all Java, which isn't practical to integrate with >> our C++ implementation (and the Python/R/Ruby bindings). But I think these >> formats are complex enough that it's best to develop the core >> implementation within the respective community, rather than within the >> Arrow repo. There was a discussion to start one a C++/Rust implementation >> for Iceberg [2], but I haven't seen any progress so far. I haven't been >> watching Hudi. >> 3. We need a model for extending Arrow C++ datasets in separate packages, >> or else we contribute to the package size problem you mentioned in your >> other thread [3]. > > There may be other potential ways forward, such as integrate > Iceberg/Hudi using a Flight or ADBC endpoint. > > Regards > > Antoine.