I recently did an analysis of the OneTable project, overall it made me a
bit confused.

>From an end user's perspective, no one really wants to use all these 3
formats, and most companies do not have the engineering resources to
maintain a stack of all these 3 formats. Eventually people pick one and
just stick with it.

If the goal is to provide a converter, then individual communities have
developed different tools, such as Delta's Uniform, Iceberg's snapshot and
migrate procedures, Hudi's bootstrap methods. The advantage of those tools
is that the specific community knows the best way to convert a foreign data
source to its native format, and can declare compatibility and fail
whenever necessary. It is not bounded to the expressiveness of an internal
data model like OneTable, OneField, OneSchema, etc.

If the goal is format unification, at least for me being in the Iceberg
community with a bit bias, a more straightforward way to achieve the goal
is to extend the feature of "Iceberg external tables", where we can map
Hive, Delta, Hudi and other table formats directly to Iceberg format behind
a REST catalog, and make that readable. This is kind of related to a recent
email thread I sent regarding the EXTERNAL/MANAGED syntax
<https://lists.apache.org/thread/ohqfvhf4wofzkhrvff1lxl58blh432o6>. And
linking back to this thread, that essentially makes Iceberg the unified
format, and we are actually pretty close to achieving that. With this
approach, you get not just conversion, you can (1) not do physical metadata
conversion but directly convert table metadata at runtime to Iceberg data
model, (2) query all the tables using a single unified Iceberg connector in
all supported engines, and (3) it is a very standardized external table
concept that all database system folks immediately understand.

This makes me feel that we are trying to make OneTable a new table format
without saying it is a new table format. Although the Apache Incubation
proposal clearly says "OneTable is NOT a new table format", it is hard for
me to envision a long-term roadmap that does not eventually make it a table
format, with connectors and data maintenance features built directly
against this internal model, which is kind of feels like what the
commercial entity OneHouse is trying to do right now, but maybe I am wrong.

What do you think?

Best,
Jack Ye

On Tue, Dec 5, 2023 at 3:30 PM Jesús Camacho Rodríguez <jcama...@apache.org>
wrote:

> Currently, there is no established group discussions. The project was
> recently open-sourced, and communication is currently done through GitHub.
> (If the project is accepted into the ASF incubator, mailing lists will be
> created). If you're interested in regular meetings, feel free to suggest it
> to the community on GitHub.
>
> Thanks,
> Jesús
>
>
> On 2023/12/05 06:30:38 Gaurav Agarwal wrote:
> > HI
> > Thanks for this mail , I would like to know is there any group discussion
> > also happened or any call to discuss the issues.
> >
> > thanks
> >
> >
> > On Tue, Dec 5, 2023 at 9:29 AM Walaa Eldin Moustafa <
> wa.moust...@gmail.com>
> > wrote:
> >
> > > Thanks Jesus for sharing OneTable. Looks like it touches upon some of
> the
> > > topics we discussed in the Rise of Table Formats panel at VLDB
> > > <https://ceur-ws.org/Vol-3462/CDMS18.pdf> back in September. I was
> > > browsing through the source code, and I ran into the OneField
> > > <
> https://github.com/onetable-io/onetable/blob/main/api/src/main/java/io/onetable/model/schema/OneField.java>
> class
> > > and noticed it has support for default values, which is good, but in
> the
> > > Iceberg spec, there are two default values
> > > <https://iceberg.apache.org/spec/#default-values> (more details in the
> > > spec and respective PR). I was pointing this out as an example of small
> > > nuances that can differ from one format to another and was wondering
> how
> > > OneTable is planning to bridge them?
> > >
> > > Thanks,
> > > Walaa.
> > >
> > >
> >
>

Reply via email to