I think druid-sql could support the Hive use case without too much reworking. It has a method that returns a Sequence:
public abstract Sequence<Object[]> runQuery(); But it also has another method that returns the Druid query, and Hive would probably call that one: public DruidQuery toDruidQuery() Additionally, I guess Hive doesn't want to push "HAVING" and "ORDER BY" down to Druid, so it should avoid adding those rules. There is enough flexibility in druid-sql for that (push down of where, group by, having, and order by all implemented as separate rules). About reducing dependencies -- it would be tough, since druid-sql's planning logic also uses Druid model classes (like ExtractionFn, Query, etc) as part of its rules, and so it depends on druid-processing pretty deeply. Hopefully that would be acceptable to current users of calcite-druid. I think it does have a big advantage: by using Druid's own model classes, there is no need to implement serde and query validation twice. > I think, the hypothetical case you mentioned is also worth considering, to > ease up the development process, we can consider moving calcite-druid as a > module in druid, so that we make release of both druid-sql and > calcite-adapter together. By this: do you mean you're considering removing calcite-druid altogether? So, if someone wants to use Calcite with Druid, they should depend on druid-sql (or druid-calcite or whatever) rather than calcite-druid? Gian On Wed, Feb 7, 2018 at 9:07 AM, Nishant Bangarwa <[email protected]> wrote: > Having a focused effort into a single project would be great and would > definitely help us in evolving druid sql capabilities faster. > > 1) One more thing that we need to consider here is that calcite > druid-adapter is also used in Apache Hive where we use the druid rules to > generate an optimized plan and then the druid query is executed from druid > containers. In druid-sql I believe the query execution logic is tied to the > fact that execution node is a druid-broker where native queries can be run > to generate a Sequence of results. We might need some rework there to > ensure that things work fine with hive too after proposed changes. > > 2) druid-sql dependencies can probably be reduced by separating the > planning and execution logic in druid-sql, the planning logic need not > depend on lots of druid code and can have light-weight dependencies while > the execution part and result serde which pulls in lots of druid > dependencies can reside in separate module and calcite druid-adapter need > not depend on that module. > > I think, the hypothetical case you mentioned is also worth considering, to > ease up the development process, we can consider moving calcite-druid as a > module in druid, so that we make release of both druid-sql and > calcite-adapter together. > > On Wed, 7 Feb 2018 at 09:02 Gian Merlino <[email protected]> wrote: > > > Hi Calcites, > > > > I would like to raise the idea of adding druid-sql ( > > > > http://search.maven.org/#artifactdetails%7Cio.druid%7Cdruid- > sql%7C0.11.0%7Cjar > > ) > > as a dependency in Calcite's Druid adapter. It should reduce the size of > > calcite-druid substantially, since it would mostly just be calling into > > druid-sql. > > > > This has some advantages for both projects. > > > > 1) Support for new Druid features often appears in Druid SQL first. By > > embedding druid-sql, Calcite gets these new features too, without extra > > work. For example https://issues.apache.org/jira/browse/CALCITE-2170 is > an > > outstanding jira to add support for Druid expressions to Calcite, but > > druid-sql already supports these. In fact it looks like some of the code > in > > the proposed patch is copied from druid-sql. As another example, > > https://issues.apache.org/jira/browse/CALCITE-2077 switched table scans > > from "select" to "scan", which had been previously done in Druid SQL in > > https://github.com/druid-io/druid/pull/4751. > > > > 2) Depending on druid-sql means Calcite doesn't need to implement its own > > Druid query and result serde code. Druid already has it. > > > > 3) Focused effort on a single module rather than the split effort that we > > have today, where some developers are contributing to druid-sql and some > > are contributing to calcite-druid. > > > > 4) More test coverage for both projects, presumably. > > > > I think (3) and (4) especially would give us the opportunity to improve > > both projects much more rapidly. > > > > However, there are also some possible disadvantages. > > > > 1) druid-sql is a somewhat heavyweight module. It pulls in a lot of other > > Druid code. Calcite users may prefer a lighter weight module. > > > > 2) druid-sql's APIs are not intended to be stable, and probably never > will > > be. They may break on minor releases. So updating the version of > druid-sql > > in Calcite may involve tweaking how functions are called, etc. I think > this > > effort should be minimal if calcite-druid is mostly just delegating to > > druid-sql. > > > > 3) druid-sql depends on calcite-core. This should usually be fine, but it > > means that if calcite-core has a breaking change, then calcite-druid > cannot > > update its version of druid-sql until druid-sql first updates its version > > of calcite-core. > > > > Despite these potential difficulties, I think the potential benefit means > > this is worth exploring. > > > > Finally: a hypothetical. Why not do the other way around -- have Druid > add > > calcite-druid as a dependency? The main reason is that this makes the > Druid > > development process awkward when a new Druid SQL feature also requires a > > new native query feature. Today, we develop the native query and SQL > sides > > together. If Druid depended on calcite-druid, then we would need to > develop > > the native query side first, then release it, then update Calcite's Druid > > adapter, then pull that back into Druid. Generally, just adding an extra > > rule in druid-sql wouldn't be enough, since the sorts of changes we are > > making at this point are typically more extensive than just adjusting > > rules. > > > > Gian > > >
