FYI - Druid devs are proposing to embed Calcite in Druid, so that Druid has a 
native SQL interface.

I think Gian’s plan makes sense; its architecture is similar in a lot of ways 
with how we are integrating with Phoenix. Calcite’s Druid adapter will still 
exist, still be useful, and in fact I expect that Druid devs will end up 
building on it.

I replied on the thread:!topic/druid-development/3npt9Qxpjr0 


> Begin forwarded message:
> From: Gian Merlino <>
> Subject: [druid-dev] [Proposal] Built-in SQL for Druid
> Date: October 12, 2016 at 9:50:48 AM PDT
> To:
> Reply-To:
> Inspired by the Calcite Druid adapter 
> ( 
> <>) 
> I've been playing around with something similar that lives inside of the 
> Druid Broker. It seems promising, so in this proposal I'm suggesting we 
> include an official SQL server inside Druid itself.
> I am hoping that we can:
> 1) Use Calcite for SQL parsing and optimizing, and use Avatica 
> ( 
> <>) for the server and 
> the JDBC client.
> 2) Like the official Calcite Druid adapter, have a set of rules that push 
> down filters, projections, aggregations, sorts, etc into normal Druid queries.
> 3) Unlike the official Calcite Druid adapter, use Druid objects (like 
> DimFilter, ExtractionFn, etc) as model classes, since it avoids extra code, 
> helps with type safety, and speeds up development.
> 4) Have this all run on the Broker, which would then make normal Druid 
> queries to data nodes.
> 5) Work towards being able to push down more and more SQL into normal Druid 
> queries over time.
> Current status
> If people are interested in this proposal then I'll clean up the code a bit 
> and do a PR. Currently it's a rough prototype. Some things that do work:
> 1) Avatica handler running at /druid/v2/sql/ + Avatica JDBC driver
> 2) Determining column types with segment metadata queries
> 3) Pushing down operator sequences that look like filter -> project -> 
> aggregate -> project -> sort into groupBy, timeseries, and select queries as 
> appropriate
> 4) Using "intervals" to filter on time when appropriate
> 5) LIKE, range, equality, and boolean filters
> 7) Some extraction fns like SUBSTRING, CHAR_LENGTH
> 8) GROUP BY FLOOR(__time TO gran) for time-series
> 9) Arithmetic post-aggregations
> 10) Filtered aggregations using CASE or using FILTER(WHERE ...)
> 11) Semi-joins like SELECT ... WHERE xxx IN (SELECT ...) can run by 
> materializing the inner result on the broker and applying it to the outer 
> query as a filter. Obviously doesn't always work, but it works sometimes (and 
> it works more often than pulling the lefthand side into the Broker…).
> Non-exhaustive list of things that don't work:
> 1) Pushing down filter after aggregate (HAVING)
> 2) Push down of anything without a native Druid analog, like multi-column 
> extraction fns, aggregation of expressions, window functions, etc.
> 3) Any extraction fns other than SUBSTRING, CHAR_LENGTH
> 4) A lot of time stuff, like x + INTERVAL, FLOOR(__time TO MONTH) = x, etc.
> 5) Query time lookups
> 6) Select with pagination – only the first 1000 results are used
> 7) Any sort of memory usage controls on the Broker side
> 1) Why another SQL on Druid thing? There's already, like, 7 of them.
> I think the fact that there are 7 of them means there's clearly some value in 
> having a built-in implementation. Partially this is so we can hopefully share 
> some work between the projects. Partially this is because Druid doesn't 
> support some things that are needed for well rounded SQL support (like 
> multi-column extraction fns, aggregations of expressions, etc) and having the 
> SQL layer inside the Druid repo will make it possible to develop those sorts 
> of features hand in hand with the SQL planner rules.
> Btw, the 7 that I counted are, in alphabetical order, Calcite 
> ( 
> <>), Drill 
> ( 
> <>),
>  Druid's own simple grammar (added in 2013, removed in 
> <>), Hive 
> ( 
> <>), PlyQL 
> ( <>), Sparkline 
> ( 
> <>), and Sql4D 
> ( <>).
> 2) Is the proposed SQL language actually SQL or is it "SQL-like"?
> In terms of what can be efficiently pushed down to Druid queries, it's 
> "SQL-like". A lot of common SQL features aren't supported – although I think 
> it makes sense to add more over time. Technically Calcite does speak full 
> SQL, but a lot of it at the start would get planned as pulling all the raw 
> data into the Broker and processing it in Calcite's interpreter.
> 3) Why not use the Druid adapter in Calcite?
> Calcite's Druid adapter doesn't depend on any Druid jars; it implements the 
> query language and protocol using its own set of model and client classes. 
> For a builtin approach I wanted to be able to use Druid's own Query, 
> DimFilter, ExtractionFn, etc in a type-safe way, and wanted to use the query 
> code that already exists in Druid for discovering and querying data nodes. I 
> think this will also help speed up development of Druid features that allow 
> more SQL to be pushed down.
> 4) Can we share work between a builtin Druid SQL and the other SQL on Druid 
> adapters that people are working on?
> Hopefully! I think it would make sense if a builtin Druid SQL could be used 
> for whatever Druid supports natively, and external SQL on Druid adapters 
> could be used when users want to do something that Druid doesn't support. 
> Sharing the work needed to translate "whatever Druid supports natively" into 
> Druid queries would help everyone.
> Hive and Drill already use Calcite internally, and I hope it's workable to 
> stuff Druid's own rules into their planners without changing too much. If 
> those projects are comfortable embedding druid-server then that should work 
> straight away. If they aren't comfortable embedding druid-server (perhaps 
> understandably) then we could bite the bullet and work on a light(er) weight 
> druid-client jar that has just enough to give us the benefit of type 
> checking, and does not include all the heavy Druid functionality.
> If you're working on one of those projects, feedback is greatly appreciated.
> 5) What happens when parts of the SQL query can't be converted to a native 
> Druid query?
> Calcite is rad and runs the parts that can't be pushed down through an 
> interpreter on the Druid Broker. Of course this means that if you use 
> constructs that are close to the data and can't be pushed down, like grouping 
> on CONCAT(foo, bar) or aggregating SUM(3 * bar), potentially a surprisingly 
> large amount of data will be pulled out into the Broker. This is not great 
> behavior and something should be done about that…
> 6) What about JOINs?
> I don't know, maybe it makes sense for Druid to have query types usable for 
> joins in the future. But it doesn't now; the closest thing is query-time 
> lookups, which is like a broadcast join. Without native join support in 
> Druid, it makes more sense to pull data out of Druid into another system 
> (like Drill or Hive or Spark) and do the join there. Even if Druid did 
> support native joins, there's still some value in using an external execution 
> engine to join Druid data with data from some other system. Filters and 
> aggregations can still potentially be pushed down, depending on the query.
> 7) JDBC works, but what about ODBC?
> Avatica's home page says work on an ODBC client has not yet started. The page 
> at 
> <> is 
> interesting, since the Phoenix Query Server is also Avatica based, so maybe 
> that work could be useful? However, it doesn't seem to be open source, and 
> when I tried to get the binary to work, the Windows ODBC setup thing crashed 
> after calling getTableTypes. Maybe someone at Hortonworks can comment :)
> Gian
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Druid Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to 
> <>.
> To post to this group, send email to 
> <>.
> To view this discussion on the web visit 
> <>.
> For more options, visit 
> <>.

Reply via email to