This patch supports snapshot queries on MOR table: https://github.com/trinodb/trino/pull/9641 That works with the existing hive connector.
Right now, I have only prototyped snapshot queries on COW table with the new hudi connector in https://github.com/codope/trino/tree/hudi-plugin I will be working on supporting the MOR table as well. Regards, Sagar On Wed, Oct 20, 2021 at 4:48 PM Jian Feng <jian.f...@shopee.com> wrote: > When can Trino support snapshot queries on the Merge-on-read table? > > On Mon, Oct 18, 2021 at 9:06 PM 周康 <zhoukang199...@gmail.com> wrote: > > > +1 i have send a message on trino slack, really appreciate for the new > > trino plugin/connector. > > https://trinodb.slack.com/archives/CP1MUNEUX/p1623838591370200 > > > > looking forward to the RFC and more discussion > > > > On 2021/10/17 06:06:09 sagar sumit wrote: > > > Dear Hudi Community, > > > > > > I would like to propose the development of a new Trino plugin/connector > > for > > > Hudi. > > > > > > Today, Hudi supports snapshot queries on Copy-On-Write (COW) tables and > > > read-optimized queries on Merge-On-Read tables with Trino, through the > > > input format based integration in the Hive connector [1 > > > <https://github.com/prestodb/presto/commits?author=vinothchandar>]. > This > > > approach has known performance limitations with very large tables, > which > > > has been since fixed on PrestoDB [2 > > > <https://prestodb.io/blog/2020/08/04/prestodb-and-hudi>]. We are > > working on > > > replicating the same fixes on Trino as well [3 > > > <https://github.com/trinodb/trino/pull/9641>]. > > > > > > However, as Hudi keeps getting better, a new plugin to provide access > to > > > Hudi data and metadata will help in unlocking those capabilities for > the > > > Trino users. Just to name a few benefits, metadata-based listing, full > > > schema evolution, etc [4 > > > < > > > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution > > >]. > > > Moreover, a separate Hudi connector would allow its independent > evolution > > > without having to worry about hacking/breaking the Hive connector. > > > > > > A separate connector also falls in line with our vision [5 > > > < > > > https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver > > >] > > > when we think of a standalone timeline server or a lake cache to > balance > > > the tradeoff between writing and querying. Imagine users having read > and > > > write access to data and metadata in Hudi directly through Trino. > > > > > > I did some prototyping to get the snapshot queries on a Hudi COW table > > > working with a new plugin [6 > > > <https://github.com/codope/trino/tree/hudi-plugin>], and I feel the > > effort > > > is worth it. High-level approach is to implement the connector SPI [7 > > > <https://trino.io/docs/current/develop/connectors.html>] provided by > > Trino > > > such as: > > > a) HudiMetadata implements ConnectorMetadata to fetch table metadata. > > > b) HudiSplit and HudiSplitManager implement ConnectorSplit and > > > ConnectorSplitManager to produce logical units of data partitioning, so > > > that Trino can parallelize reads and writes. > > > > > > Let me know your thoughts on the proposal. I can draft an RFC for the > > > detailed design discussion once we have consensus. > > > > > > Regards, > > > Sagar > > > > > > References: > > > [1] https://github.com/prestodb/presto/commits?author=vinothchandar > > > [2] https://prestodb.io/blog/2020/08/04/prestodb-and-hudi > > > [3] https://github.com/trinodb/trino/pull/9641 > > > [4] > > > > > > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution > > > [5] > > > > > > https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver > > > [6] https://github.com/codope/trino/tree/hudi-plugin > > > [7] https://trino.io/docs/current/develop/connectors.html > > > > > > > > -- > *Jian Feng,冯健* > Shopee | Engineer | Data Infrastructure >