Re: Druid + Presto?

Mainak Ghosh Fri, 10 Jul 2020 09:36:57 -0700
+ Zhenxiao

> On Jul 9, 2020, at 11:48 PM, Gian Merlino <g...@apache.org> wrote:
> 
> One other thing I'm wondering is how similar are the two forks of Presto? Are 
> patches generally being shared between them or are they going off in 
> different directions? One example: as I understand it, aggregate pushdown 
> support was added to the core of both forks relatively recently — within the 
> last year or so — does it work the same way in each one? I'm wondering how 
> much work can be shared between these different efforts and perhaps between 
> these efforts and the Druid project itself.
> 
> On Thu, Jul 9, 2020 at 11:24 PM Gian Merlino <g...@apache.org 
> <mailto:g...@apache.org>> wrote:
> Hey Samarth,
> 
> Thanks for sharing these details.
> 
> In the overall warehouse + Druid setup you're envisioning, would Druid be the 
> main way of querying the tables that it stores? Or would they all be synced 
> periodically from the warehouse into Druid, using the warehouse as a source 
> of truth? I'm asking since I'm wondering how important it is to think about 
> functionality that might help load datasources based on tables that are in 
> the Presto metastore.
> 
> >  You bring up an interesting idea on the reverse connector. What do you 
> > think the value of such a connector will be? I am assuming Druid SQL for 
> > the most part is ANSI SQL.
> 
> Druid SQL is ANSI SQL for the most part but there are two big differences. 
> First, it doesn't support everything in ANSI SQL (two examples: it currently 
> doesn't support shuffle joins and windowed aggregations). Second, it supports 
> some functionality that is not in ANSI SQL (like the TIME_ and DS_ 
> operators). So it is smaller in some ways and bigger in other ways. I was 
> thinking a reverse translator could let you write a Druid SQL query that uses 
> our special operators, but also requires a shuffle join, and then translate 
> and execute it as an equivalent Presto SQL query. The idea being you can 
> express your query in either dialect and get routed to the right place in the 
> end.
> 
> On Thu, Jul 9, 2020 at 4:36 PM Samarth Jain <sama...@apache.org 
> <mailto:sama...@apache.org>> wrote:
> Gian,
> 
> For the presto-sql version of Druid connector, for V1, we decided to pursue
> the JDBC route. You can follow along on the progress here -
> https://github.com/prestosql/presto/issues/1855 
> <https://github.com/prestosql/presto/issues/1855>
> My colleague, Parth (cc'ed as well) is working on implementing Druid
> aggregation push down including support for top-n style queries. Our
> immediate use cases, and what we think Druid
> generally is more suitable for, is for solving for aggregate group by style
> queries. Having a presto-druid connector also enables us to join data in
> Druid with the rest of our warehouse.
> In general though, for queries that don't do any aggregations i.e. which
> get translated to Druid SCAN queries, it makes sense to by-pass the Druid
> datanodes altogether and directly go
> to the deep storage. I think Druid provides enough metadata about the
> active segment files to be able to do that relatively easily.
> 
> You bring up an interesting idea on the reverse connector. What do you
> think the value of such a connector will be? I am assuming Druid SQL for
> the most part is ANSI SQL.
> 
> On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <z...@twitter.com.invalid>
> wrote:
> 
> > Thank you, Mainak.
> >
> > Hi Gian,
> >
> > Glad to see you are interested in Presto Druid connector.
> >
> > My colleague, @Hao Luo <h...@twitter.com <mailto:h...@twitter.com>> @Beinan 
> > Wang
> > <bein...@twitter.com <mailto:bein...@twitter.com>> and
> > me, together, implemented the Presto Druid connector in PrestoDB:
> > https://prestodb.io/docs/current/connector/druid.html 
> > <https://prestodb.io/docs/current/connector/druid.html>
> >
> > Our implementation includes:
> > 1. Presto could scan Druid segments to compute SQL results
> > 2. aggregation pushdown, where Presto leverages Druid fast aggregation
> > capabilities, and stream aggregated result from Druid
> > actually, we implemented 2 execution paths, users could use configurations
> > to control whether they'd like to scan segments or pushdown all sub-queries
> > to Druid
> >
> > We had run benchmarkings comparing Presto Druid connector with other SQL
> > engines. And are ready to run production workloads.
> >
> > Thanks,
> > Zhenxiao
> >
> > On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mgh...@twitter.com 
> > <mailto:mgh...@twitter.com>> wrote:
> >
> > > Hello Gian,
> > >
> > > We are currently testing the (other) Presto Druid connector at our end.
> > It
> > > has aggregation push down support. Adding Zhenxiao to this thread since
> > he
> > > is the primary developer of the connector. He can provide the kind of
> > > details you are looking for.
> > >
> > > Thanks,
> > > Mainak
> > >
> > > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <g...@apache.org 
> > > > <mailto:g...@apache.org>> wrote:
> > > >
> > > > By the way, I see that the other Presto has a Druid connector too:
> > > > https://prestodb.io/docs/current/connector/druid.html 
> > > > <https://prestodb.io/docs/current/connector/druid.html>. From the docs
> > it
> > > > looks like it has different lineage and might even work differently.
> > > >
> > > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <g...@apache.org 
> > > > <mailto:g...@apache.org>> wrote:
> > > >
> > > >> I was thinking of exploring ideas like pushing down aggregations,
> > > enabling
> > > >> Presto to query directly from deep storage (in cases where there
> > aren't
> > > any
> > > >> interesting things to push down, this may be more efficient than
> > > querying
> > > >> Druid servers), enabling translation from Druid's SQL dialect to
> > > Presto's
> > > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on
> > this
> > > >> list) have any thoughts on any of those?
> > > >>
> > > >> I'm also curious what kinds of improvements you're planning to the
> > > >> connector you built.
> > > >>
> > > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <samarth.j...@gmail.com 
> > > >> <mailto:samarth.j...@gmail.com>>
> > > >> wrote:
> > > >>
> > > >>> Hi Gian,
> > > >>>
> > > >>> I contributed the jdbc based presto-druid connector in prestosql
> > which
> > > >>> went
> > > >>> out in release 337
> > > >>> https://prestosql.io/docs/current/release/release-337.html 
> > > >>> <https://prestosql.io/docs/current/release/release-337.html>. The v1
> > > >>> version
> > > >>> of the connector doesn’t support aggregate push down yet. It is being
> > > >>> actively worked on and we expect it to be improved over the next few
> > > >>> releases. We are currently evaluating using the presto-druid
> > connector
> > > in
> > > >>> our Tableau setup. It would be interesting to see what changes in
> > Druid
> > > >>> would be needed to support that integration.
> > > >>>
> > > >>> Thanks,
> > > >>> Samarth
> > > >>>
> > > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <g...@apache.org 
> > > >>> <mailto:g...@apache.org>>
> > wrote:
> > > >>>
> > > >>>> Hey Druids,
> > > >>>>
> > > >>>> I was wondering, is anyone on this list using Druid + Presto
> > together?
> > > >>> If
> > > >>>> so, what does your architecture look like and which edition / flavor
> > > of
> > > >>>> Presto and Druid connector are you using? What's your experience
> > been
> > > >>> like?
> > > >>>> I'm asking since I'm starting to think about whether it makes sense
> > to
> > > >>> look
> > > >>>> at ways to improve the integration between the two projects.
> > > >>>>
> > > >>>> Gian
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
Re: Druid + Presto?

Reply via email to