Embed druid-sql inside Calcite?

Gian Merlino Tue, 06 Feb 2018 19:33:07 -0800

Hi Calcites,

I would like to raise the idea of adding druid-sql (
http://search.maven.org/#artifactdetails%7Cio.druid%7Cdruid-sql%7C0.11.0%7Cjar)
as a dependency in Calcite's Druid adapter. It should reduce the size of
calcite-druid substantially, since it would mostly just be calling into
druid-sql.


This has some advantages for both projects.

1) Support for new Druid features often appears in Druid SQL first. By
embedding druid-sql, Calcite gets these new features too, without extra
work. For example https://issues.apache.org/jira/browse/CALCITE-2170 is an
outstanding jira to add support for Druid expressions to Calcite, but
druid-sql already supports these. In fact it looks like some of the code in
the proposed patch is copied from druid-sql. As another example,
https://issues.apache.org/jira/browse/CALCITE-2077 switched table scans
from "select" to "scan", which had been previously done in Druid SQL in
https://github.com/druid-io/druid/pull/4751.

2) Depending on druid-sql means Calcite doesn't need to implement its own
Druid query and result serde code. Druid already has it.

3) Focused effort on a single module rather than the split effort that we
have today, where some developers are contributing to druid-sql and some
are contributing to calcite-druid.

4) More test coverage for both projects, presumably.

I think (3) and (4) especially would give us the opportunity to improve
both projects much more rapidly.

However, there are also some possible disadvantages.

1) druid-sql is a somewhat heavyweight module. It pulls in a lot of other
Druid code. Calcite users may prefer a lighter weight module.

2) druid-sql's APIs are not intended to be stable, and probably never will
be. They may break on minor releases. So updating the version of druid-sql
in Calcite may involve tweaking how functions are called, etc. I think this
effort should be minimal if calcite-druid is mostly just delegating to
druid-sql.

3) druid-sql depends on calcite-core. This should usually be fine, but it
means that if calcite-core has a breaking change, then calcite-druid cannot
update its version of druid-sql until druid-sql first updates its version
of calcite-core.

Despite these potential difficulties, I think the potential benefit means
this is worth exploring.

Finally: a hypothetical. Why not do the other way around -- have Druid add
calcite-druid as a dependency? The main reason is that this makes the Druid
development process awkward when a new Druid SQL feature also requires a
new native query feature. Today, we develop the native query and SQL sides
together. If Druid depended on calcite-druid, then we would need to develop
the native query side first, then release it, then update Calcite's Druid
adapter, then pull that back into Druid. Generally, just adding an extra
rule in druid-sql wouldn't be enough, since the sorts of changes we are
making at this point are typically more extensive than just adjusting rules.

Gian

Embed druid-sql inside Calcite?

Reply via email to