Hello,

I've recently been evaluating integrating Calcite with Druid:

http://druid.io/druid.html

I have a few questions, but first let me describe what I am trying to
accomplish.

The idea would be to use Calcite's SQL parsing and planning/optimization
functionality to generate a JSON Druid query string which gets passed in
its entirety to the Druid broker.

I read the available Calcite docs and looked through some example code, and
the MongoDB example seemed similar to what I want, so I started from that
code with a few modifications.

I'm pretty new to query planning/optimization, so please let me know if my
approach is misguided or understanding is incorrect.

I am at a point where I can generate a basic Druid groupBy query from a SQL
string:
http://druid.io/docs/0.8.1/querying/groupbyquery.html

Some guidance would be greatly appreciated on the following:
---

1.) I am trying to add support for "nested GroupBys", which would be
something like "SELECT foo FROM (SELECT ...) GROUP BY bar" . When it is
time to generate the Druid query JSON in implement() of my
DruidToEnumerableConverter, how I would determine how "deep" the various
parts of my query plan are, so that I know to generate a nested query?

---

2.) I would like to extend the SQL syntax with a new function for
specifying the desired time bucketing properties for a Druid query, to be
translated into "granularity":

http://druid.io/docs/0.8.1/querying/granularities.html

Can someone point me to a good resource or example for doing this in
Calcite?

---

3.) In the MongoDB example, I see that some of the Mongo RelNode classes
like MongoFilter have defined computeSelfCost to be 0.1 of their
superclass' cost. Is this to ensure that those nodes have the "lowest
cost"? It doesn't seem like it's performing a real cost calculation.

Is there a good reference if I want to build a cost model for Druid queries?

---

Thanks,
Jonathan

Reply via email to