I recently did an analysis of the OneTable project, overall it made me a bit confused.
>From an end user's perspective, no one really wants to use all these 3 formats, and most companies do not have the engineering resources to maintain a stack of all these 3 formats. Eventually people pick one and just stick with it. If the goal is to provide a converter, then individual communities have developed different tools, such as Delta's Uniform, Iceberg's snapshot and migrate procedures, Hudi's bootstrap methods. The advantage of those tools is that the specific community knows the best way to convert a foreign data source to its native format, and can declare compatibility and fail whenever necessary. It is not bounded to the expressiveness of an internal data model like OneTable, OneField, OneSchema, etc. If the goal is format unification, at least for me being in the Iceberg community with a bit bias, a more straightforward way to achieve the goal is to extend the feature of "Iceberg external tables", where we can map Hive, Delta, Hudi and other table formats directly to Iceberg format behind a REST catalog, and make that readable. This is kind of related to a recent email thread I sent regarding the EXTERNAL/MANAGED syntax <https://lists.apache.org/thread/ohqfvhf4wofzkhrvff1lxl58blh432o6>. And linking back to this thread, that essentially makes Iceberg the unified format, and we are actually pretty close to achieving that. With this approach, you get not just conversion, you can (1) not do physical metadata conversion but directly convert table metadata at runtime to Iceberg data model, (2) query all the tables using a single unified Iceberg connector in all supported engines, and (3) it is a very standardized external table concept that all database system folks immediately understand. This makes me feel that we are trying to make OneTable a new table format without saying it is a new table format. Although the Apache Incubation proposal clearly says "OneTable is NOT a new table format", it is hard for me to envision a long-term roadmap that does not eventually make it a table format, with connectors and data maintenance features built directly against this internal model, which is kind of feels like what the commercial entity OneHouse is trying to do right now, but maybe I am wrong. What do you think? Best, Jack Ye On Tue, Dec 5, 2023 at 3:30 PM Jesús Camacho Rodríguez <jcama...@apache.org> wrote: > Currently, there is no established group discussions. The project was > recently open-sourced, and communication is currently done through GitHub. > (If the project is accepted into the ASF incubator, mailing lists will be > created). If you're interested in regular meetings, feel free to suggest it > to the community on GitHub. > > Thanks, > Jesús > > > On 2023/12/05 06:30:38 Gaurav Agarwal wrote: > > HI > > Thanks for this mail , I would like to know is there any group discussion > > also happened or any call to discuss the issues. > > > > thanks > > > > > > On Tue, Dec 5, 2023 at 9:29 AM Walaa Eldin Moustafa < > wa.moust...@gmail.com> > > wrote: > > > > > Thanks Jesus for sharing OneTable. Looks like it touches upon some of > the > > > topics we discussed in the Rise of Table Formats panel at VLDB > > > <https://ceur-ws.org/Vol-3462/CDMS18.pdf> back in September. I was > > > browsing through the source code, and I ran into the OneField > > > < > https://github.com/onetable-io/onetable/blob/main/api/src/main/java/io/onetable/model/schema/OneField.java> > class > > > and noticed it has support for default values, which is good, but in > the > > > Iceberg spec, there are two default values > > > <https://iceberg.apache.org/spec/#default-values> (more details in the > > > spec and respective PR). I was pointing this out as an example of small > > > nuances that can differ from one format to another and was wondering > how > > > OneTable is planning to bridge them? > > > > > > Thanks, > > > Walaa. > > > > > > > > >