Re: [DISCUSS] Trino Plugin for Hudi

sagar sumit Thu, 21 Oct 2021 05:09:20 -0700

This patch supports snapshot queries on MOR table:
https://github.com/trinodb/trino/pull/9641
That works with the existing hive connector.


Right now, I have only prototyped snapshot queries on COW table with the
new hudi connector in https://github.com/codope/trino/tree/hudi-plugin
I will be working on supporting the MOR table as well.

Regards,
Sagar

On Wed, Oct 20, 2021 at 4:48 PM Jian Feng <jian.f...@shopee.com> wrote:

> When can Trino support snapshot queries on the Merge-on-read table?
>
> On Mon, Oct 18, 2021 at 9:06 PM 周康 <zhoukang199...@gmail.com> wrote:
>
> > +1 i have send a message on trino slack, really appreciate for the new
> > trino plugin/connector.
> > https://trinodb.slack.com/archives/CP1MUNEUX/p1623838591370200
> >
> > looking forward to the RFC and more discussion
> >
> > On 2021/10/17 06:06:09 sagar sumit wrote:
> > > Dear Hudi Community,
> > >
> > > I would like to propose the development of a new Trino plugin/connector
> > for
> > > Hudi.
> > >
> > > Today, Hudi supports snapshot queries on Copy-On-Write (COW) tables and
> > > read-optimized queries on Merge-On-Read tables with Trino, through the
> > > input format based integration in the Hive connector [1
> > > <https://github.com/prestodb/presto/commits?author=vinothchandar>].
> This
> > > approach has known performance limitations with very large tables,
> which
> > > has been since fixed on PrestoDB [2
> > > <https://prestodb.io/blog/2020/08/04/prestodb-and-hudi>]. We are
> > working on
> > > replicating the same fixes on Trino as well [3
> > > <https://github.com/trinodb/trino/pull/9641>].
> > >
> > > However, as Hudi keeps getting better, a new plugin to provide access
> to
> > > Hudi data and metadata will help in unlocking those capabilities for
> the
> > > Trino users. Just to name a few benefits, metadata-based listing, full
> > > schema evolution, etc [4
> > > <
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution
> > >].
> > > Moreover, a separate Hudi connector would allow its independent
> evolution
> > > without having to worry about hacking/breaking the Hive connector.
> > >
> > > A separate connector also falls in line with our vision [5
> > > <
> >
> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver
> > >]
> > > when we think of a standalone timeline server or a lake cache to
> balance
> > > the tradeoff between writing and querying. Imagine users having read
> and
> > > write access to data and metadata in Hudi directly through Trino.
> > >
> > > I did some prototyping to get the snapshot queries on a Hudi COW table
> > > working with a new plugin [6
> > > <https://github.com/codope/trino/tree/hudi-plugin>], and I feel the
> > effort
> > > is worth it. High-level approach is to implement the connector SPI [7
> > > <https://trino.io/docs/current/develop/connectors.html>] provided by
> > Trino
> > > such as:
> > > a) HudiMetadata implements ConnectorMetadata to fetch table metadata.
> > > b) HudiSplit and HudiSplitManager implement ConnectorSplit and
> > > ConnectorSplitManager to produce logical units of data partitioning, so
> > > that Trino can parallelize reads and writes.
> > >
> > > Let me know your thoughts on the proposal. I can draft an RFC for the
> > > detailed design discussion once we have consensus.
> > >
> > > Regards,
> > > Sagar
> > >
> > > References:
> > > [1] https://github.com/prestodb/presto/commits?author=vinothchandar
> > > [2] https://prestodb.io/blog/2020/08/04/prestodb-and-hudi
> > > [3] https://github.com/trinodb/trino/pull/9641
> > > [4]
> > >
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution
> > > [5]
> > >
> >
> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver
> > > [6] https://github.com/codope/trino/tree/hudi-plugin
> > > [7] https://trino.io/docs/current/develop/connectors.html
> > >
> >
>
>
> --
> *Jian Feng,冯健*
> Shopee | Engineer | Data Infrastructure
>

Re: [DISCUSS] Trino Plugin for Hudi

Reply via email to