Hello, I've recently been evaluating integrating Calcite with Druid:
http://druid.io/druid.html I have a few questions, but first let me describe what I am trying to accomplish. The idea would be to use Calcite's SQL parsing and planning/optimization functionality to generate a JSON Druid query string which gets passed in its entirety to the Druid broker. I read the available Calcite docs and looked through some example code, and the MongoDB example seemed similar to what I want, so I started from that code with a few modifications. I'm pretty new to query planning/optimization, so please let me know if my approach is misguided or understanding is incorrect. I am at a point where I can generate a basic Druid groupBy query from a SQL string: http://druid.io/docs/0.8.1/querying/groupbyquery.html Some guidance would be greatly appreciated on the following: --- 1.) I am trying to add support for "nested GroupBys", which would be something like "SELECT foo FROM (SELECT ...) GROUP BY bar" . When it is time to generate the Druid query JSON in implement() of my DruidToEnumerableConverter, how I would determine how "deep" the various parts of my query plan are, so that I know to generate a nested query? --- 2.) I would like to extend the SQL syntax with a new function for specifying the desired time bucketing properties for a Druid query, to be translated into "granularity": http://druid.io/docs/0.8.1/querying/granularities.html Can someone point me to a good resource or example for doing this in Calcite? --- 3.) In the MongoDB example, I see that some of the Mongo RelNode classes like MongoFilter have defined computeSelfCost to be 0.1 of their superclass' cost. Is this to ensure that those nodes have the "lowest cost"? It doesn't seem like it's performing a real cost calculation. Is there a good reference if I want to build a cost model for Druid queries? --- Thanks, Jonathan
