FYI - Druid devs are proposing to embed Calcite in Druid, so that Druid has a native SQL interface.
I think Gian’s plan makes sense; its architecture is similar in a lot of ways with how we are integrating with Phoenix. Calcite’s Druid adapter will still exist, still be useful, and in fact I expect that Druid devs will end up building on it. I replied on the thread: https://groups.google.com/forum/?pli=1#!topic/druid-development/3npt9Qxpjr0 <https://groups.google.com/forum/?pli=1#!topic/druid-development/3npt9Qxpjr0> Julian > Begin forwarded message: > > From: Gian Merlino <[email protected]> > Subject: [druid-dev] [Proposal] Built-in SQL for Druid > Date: October 12, 2016 at 9:50:48 AM PDT > To: [email protected] > Reply-To: [email protected] > > Inspired by the Calcite Druid adapter > (https://groups.google.com/d/topic/druid-development/FK5D162ao74/discussion > <https://groups.google.com/d/topic/druid-development/FK5D162ao74/discussion>) > I've been playing around with something similar that lives inside of the > Druid Broker. It seems promising, so in this proposal I'm suggesting we > include an official SQL server inside Druid itself. > > I am hoping that we can: > > 1) Use Calcite for SQL parsing and optimizing, and use Avatica > (https://calcite.apache.org/docs/avatica_overview.html > <https://calcite.apache.org/docs/avatica_overview.html>) for the server and > the JDBC client. > 2) Like the official Calcite Druid adapter, have a set of rules that push > down filters, projections, aggregations, sorts, etc into normal Druid queries. > 3) Unlike the official Calcite Druid adapter, use Druid objects (like > DimFilter, ExtractionFn, etc) as model classes, since it avoids extra code, > helps with type safety, and speeds up development. > 4) Have this all run on the Broker, which would then make normal Druid > queries to data nodes. > 5) Work towards being able to push down more and more SQL into normal Druid > queries over time. > > Current status > > If people are interested in this proposal then I'll clean up the code a bit > and do a PR. Currently it's a rough prototype. Some things that do work: > > 1) Avatica handler running at /druid/v2/sql/ + Avatica JDBC driver > 2) Determining column types with segment metadata queries > 3) Pushing down operator sequences that look like filter -> project -> > aggregate -> project -> sort into groupBy, timeseries, and select queries as > appropriate > 4) Using "intervals" to filter on time when appropriate > 5) LIKE, range, equality, and boolean filters > 6) SUM, MIN, MAX, AVG, COUNT, COUNT DISTINCT > 7) Some extraction fns like SUBSTRING, CHAR_LENGTH > 8) GROUP BY FLOOR(__time TO gran) for time-series > 9) Arithmetic post-aggregations > 10) Filtered aggregations using CASE or using FILTER(WHERE ...) > 11) Semi-joins like SELECT ... WHERE xxx IN (SELECT ...) can run by > materializing the inner result on the broker and applying it to the outer > query as a filter. Obviously doesn't always work, but it works sometimes (and > it works more often than pulling the lefthand side into the Broker…). > > Non-exhaustive list of things that don't work: > > 1) Pushing down filter after aggregate (HAVING) > 2) Push down of anything without a native Druid analog, like multi-column > extraction fns, aggregation of expressions, window functions, etc. > 3) Any extraction fns other than SUBSTRING, CHAR_LENGTH > 4) A lot of time stuff, like x + INTERVAL, FLOOR(__time TO MONTH) = x, etc. > 5) Query time lookups > 6) Select with pagination – only the first 1000 results are used > 7) Any sort of memory usage controls on the Broker side > > FAQ > > 1) Why another SQL on Druid thing? There's already, like, 7 of them. > > I think the fact that there are 7 of them means there's clearly some value in > having a built-in implementation. Partially this is so we can hopefully share > some work between the projects. Partially this is because Druid doesn't > support some things that are needed for well rounded SQL support (like > multi-column extraction fns, aggregations of expressions, etc) and having the > SQL layer inside the Druid repo will make it possible to develop those sorts > of features hand in hand with the SQL planner rules. > > Btw, the 7 that I counted are, in alphabetical order, Calcite > (https://calcite.apache.org/docs/druid_adapter.html > <https://calcite.apache.org/docs/druid_adapter.html>), Drill > (https://groups.google.com/d/msg/druid-development/FK5D162ao74/EnYDjASWCQAJ > <https://groups.google.com/d/msg/druid-development/FK5D162ao74/EnYDjASWCQAJ>), > Druid's own simple grammar (added in 2013, removed in > https://github.com/druid-io/druid/pull/2090 > <https://github.com/druid-io/druid/pull/2090>), Hive > (https://cwiki.apache.org/confluence/display/Hive/Druid+Integration > <https://cwiki.apache.org/confluence/display/Hive/Druid+Integration>), PlyQL > (http://plywood.imply.io/plyql <http://plywood.imply.io/plyql>), Sparkline > (https://github.com/SparklineData/spark-druid-olap > <https://github.com/SparklineData/spark-druid-olap>), and Sql4D > (https://github.com/srikalyc/Sql4D <https://github.com/srikalyc/Sql4D>). > > 2) Is the proposed SQL language actually SQL or is it "SQL-like"? > > In terms of what can be efficiently pushed down to Druid queries, it's > "SQL-like". A lot of common SQL features aren't supported – although I think > it makes sense to add more over time. Technically Calcite does speak full > SQL, but a lot of it at the start would get planned as pulling all the raw > data into the Broker and processing it in Calcite's interpreter. > > 3) Why not use the Druid adapter in Calcite? > > Calcite's Druid adapter doesn't depend on any Druid jars; it implements the > query language and protocol using its own set of model and client classes. > For a builtin approach I wanted to be able to use Druid's own Query, > DimFilter, ExtractionFn, etc in a type-safe way, and wanted to use the query > code that already exists in Druid for discovering and querying data nodes. I > think this will also help speed up development of Druid features that allow > more SQL to be pushed down. > > 4) Can we share work between a builtin Druid SQL and the other SQL on Druid > adapters that people are working on? > > Hopefully! I think it would make sense if a builtin Druid SQL could be used > for whatever Druid supports natively, and external SQL on Druid adapters > could be used when users want to do something that Druid doesn't support. > Sharing the work needed to translate "whatever Druid supports natively" into > Druid queries would help everyone. > > Hive and Drill already use Calcite internally, and I hope it's workable to > stuff Druid's own rules into their planners without changing too much. If > those projects are comfortable embedding druid-server then that should work > straight away. If they aren't comfortable embedding druid-server (perhaps > understandably) then we could bite the bullet and work on a light(er) weight > druid-client jar that has just enough to give us the benefit of type > checking, and does not include all the heavy Druid functionality. > > If you're working on one of those projects, feedback is greatly appreciated. > > 5) What happens when parts of the SQL query can't be converted to a native > Druid query? > > Calcite is rad and runs the parts that can't be pushed down through an > interpreter on the Druid Broker. Of course this means that if you use > constructs that are close to the data and can't be pushed down, like grouping > on CONCAT(foo, bar) or aggregating SUM(3 * bar), potentially a surprisingly > large amount of data will be pulled out into the Broker. This is not great > behavior and something should be done about that… > > 6) What about JOINs? > > I don't know, maybe it makes sense for Druid to have query types usable for > joins in the future. But it doesn't now; the closest thing is query-time > lookups, which is like a broadcast join. Without native join support in > Druid, it makes more sense to pull data out of Druid into another system > (like Drill or Hive or Spark) and do the join there. Even if Druid did > support native joins, there's still some value in using an external execution > engine to join Druid data with data from some other system. Filters and > aggregations can still potentially be pushed down, depending on the query. > > 7) JDBC works, but what about ODBC? > > Avatica's home page says work on an ODBC client has not yet started. The page > at https://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/ > <https://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/> is > interesting, since the Phoenix Query Server is also Avatica based, so maybe > that work could be useful? However, it doesn't seem to be open source, and > when I tried to get the binary to work, the Windows ODBC setup thing crashed > after calling getTableTypes. Maybe someone at Hortonworks can comment :) > > Gian > > -- > You received this message because you are subscribed to the Google Groups > "Druid Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/druid-development/CACZNdYAVv8WLP1Qw4dzzf60e-3CwzP_%3DFmWeOd_2OvPsfcw8Ag%40mail.gmail.com > > <https://groups.google.com/d/msgid/druid-development/CACZNdYAVv8WLP1Qw4dzzf60e-3CwzP_%3DFmWeOd_2OvPsfcw8Ag%40mail.gmail.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>.
