Re: [DISCUSS] Some thoughts about the structure of the Druid Adapter

Julian Hyde Fri, 21 Jul 2017 18:34:47 -0700

Suppose we were to calculate the DruidQuery.QuerySpec or its enum
QueryType at the same time as the creation of a DruidQuery. So the
DruidQuery.querySpec field would final and mandatory, set in the
constructor.(Currently we compute it on demand a little while later.)


Then I think we'd end up with very similar code to what you are
suggesting, but it would live in sub-classes QueryType, or perhaps
switch statements in QuerySpec or DruidQuery, rather than sub-classes
of DruidQuery.

Would that have a similar effect?

I am leary about sub-classing DruidQuery because you only get to slice
it one way. Also, when you have done the slicing, you have to move
code into the slices, and that makes it into a "big" refactoring just
by virtue of the amount of code that is copy-pasted from one place to
another.

QuerySpec already seems to do its job pretty well, it's just that
DruidQuery can't rely on it being fully there.

Julian


On Fri, Jul 21, 2017 at 4:12 PM, Zain Humayun
<[email protected]> wrote:
>
> Hello Calcite Community,
>
>
>
> Over the past few months I’ve gotten to familiarize myself with the Druid 
> adapter in Calcite, and wanted to have a discussion on the general structure 
> of the Druid adapter. For those unfamiliar, Druid supports a hand full of 
> queries[1]. Each query has a specialized use case, different trade offs, 
> different fields, etc. At the moment, DruidQuery.java acts as the 
> representation for all the different kinds of queries (Select, TopN, GroupBy, 
> Timeseries). This works well if the queries are simple, but when we try to 
> push more and more into Druid, we start to notice the drawbacks to this 
> approach.
>
>
>
> For one, when it comes time to actually issue the Druid query, a great deal 
> of logic is required to determine which kind of Druid query to generate (see 
> DruidQuery#getQuery). Another issue arises in the rules to push RelNodes into 
> Druid. Certain rules will need to check whether pushing a RelNode results in 
> a DruidQuery where the query type is valid. A good example of this 
> DruidSortRule. Again, there is more logic to determine which kind of 
> DruidQuery will be produced. This becomes a problem when one needs to add on 
> to a rule, or needs to do the same check for their own rule. Lastly, I think 
> that the current adapter structure makes it harder for newcomers to 
> understand how the adapter works, and harder to add features to it. All of 
> these problems get worse when we decide to support another (existing or 
> future) Druid query type later on.
>
>
>
> With all that said, it would then seem natural to represent each query type 
> as it’s own RelNode (DruidSelectQuery, DruidTopNQuery, etc). DruidQuery can 
> serve as an abstract base class that contains a rule to transform itself into 
> the different kinds of druid query RelNodes. Each query type will contain 
> it’s own set specialized rules to push RelNodes into them, a tailored cost 
> function, and logic. The VolcanoPlanner takes care of the rest. Whichever 
> druid query can achieve the lowest cost by pushing in the most RelNodes will 
> be chosen and executed.
>
>
>
> This, of course would be a very large refactor, but I think it would be 
> beneficial in the long run. There’s at least one open JIRA ticket 
> (CALCITE-1206) that would benefit from this change.
>
>
>
> Anyways, i’d be interested to hear from the community on what their thoughts 
> on this kind of change are.
>
>
>
> Thanks!
>
>
> Zain
>
>
> [1] http://druid.io/docs/0.10.0/querying/querying.html
>
>
>
>
>
>

Re: [DISCUSS] Some thoughts about the structure of the Druid Adapter

Reply via email to