Hi all, Le 03/10/2022 à 17:03, Will Jones a écrit :
Hi Rusty, Note we discussed Iceberg a while ago [1]. I don't think we've discussed Hudi in any depth. As I see it, we are waiting on three things: 1. Someone willing to move forward the Iceberg / Hudi integration. 2. The Iceberg and Hudi projects need native libraries that we can use. The base implementations are all Java, which isn't practical to integrate with our C++ implementation (and the Python/R/Ruby bindings). But I think these formats are complex enough that it's best to develop the core implementation within the respective community, rather than within the Arrow repo. There was a discussion to start one a C++/Rust implementation for Iceberg [2], but I haven't seen any progress so far. I haven't been watching Hudi. 3. We need a model for extending Arrow C++ datasets in separate packages, or else we contribute to the package size problem you mentioned in your other thread [3].
There may be other potential ways forward, such as integrate Iceberg/Hudi using a Flight or ADBC endpoint.
Regards Antoine.