Re: Help with Calcite/Druid integration

Ted Dunning Sat, 17 Oct 2015 14:41:12 -0700

So you really just need a rule that recognizes the TIME_BUCKET function
inside the GROUP BY, right?


And then probably you need to make sure that this function is propagated
correctly (this may be automatic with Calcite) to make the following
equivalent:


SELECT
SUM(tweet_length) as TotalTweetLength
FROM twitterstream
GROUP BY TIME_BUCKET(__time, PT1H, 'Etc/UTC')

SELECT
SUM(tweet_length) as TotalTweetLength,
TIME_BUCKET(__time, PT1H, 'Etc/UTC') as t
FROM twitterstream
GROUP BY t


This might arise from nested queries or such and there are probably severe
limits on how Druid can handle these buckets which will complicate your
life.

It might also be very helpful if you look into how Drill allows data
sources to inject rules into the query optimizer. That is usually used to
express what kinds of push-down a function can accept and this seems to be
exactly such a case.

For instance, presumably, you would like to handle cases like this:

SELECT
SUM(tweet_length) as TotalTweetLength
FROM twitterstream
GROUP BY TIME_BUCKET(__time, PT1H, 'Etc/UTC'), floor(TotalTweetLength/10)

(assuming Druid can do this)





On Fri, Oct 16, 2015 at 10:48 PM, Jonathan Wei <[email protected]> wrote:

> The time bucketing I have in mind is a feature supported by an existing SQL
> client for Druid, it would be used with GROUP BY:
>
> https://github.com/implydata/plyql
>
> An example of it would be:
>
> plyql -h 10.20.30.40 -i P1D -q "
> SELECT
> SUM(tweet_length) as TotalTweetLength
> FROM twitterstream
> GROUP BY TIME_BUCKET(__time, PT1H, 'Etc/UTC')
> "
>
> The "GROUP BY TIME_BUCKET(__time, PT1H, 'Etc/UTC')" would be equivalent to
> specifying the following within the Druid query JSON:
>
> ...
> "granularity": {"type": "period", "period": "PT1H", "timeZone": "Etc/UTC"}
> ...
>
>
> On Fri, Oct 16, 2015 at 6:13 PM, Ted Dunning <[email protected]>
> wrote:
>
> > On Fri, Oct 16, 2015 at 4:33 PM, Jonathan Wei <[email protected]> wrote:
> >
> > > 2.) I would like to extend the SQL syntax with a new function for
> > > specifying the desired time bucketing properties for a Druid query, to
> be
> > > translated into "granularity":
> > >
> > > http://druid.io/docs/0.8.1/querying/granularities.html
> > >
> > > Can someone point me to a good resource or example for doing this in
> > > Calcite?
> > >
> >
> > Why is this not a group by operation?
> >
>

Re: Help with Calcite/Druid integration

Reply via email to