Re: Embed druid-sql inside Calcite?

Nishant Bangarwa Wed, 07 Feb 2018 09:07:37 -0800

Having a focused effort into a single project would be great and would
definitely help us in evolving druid sql capabilities faster.


1) One more thing that we need to consider here is that calcite
druid-adapter is also used in Apache Hive where we use the druid rules to
generate an optimized plan and then the druid query is executed from druid
containers. In druid-sql I believe the query execution logic is tied to the
fact that execution node is a druid-broker where native queries can be run
to generate a Sequence of results. We might need some rework there to
ensure that things work fine with hive too after proposed changes.

2) druid-sql dependencies can probably be reduced by separating the
planning and execution logic in druid-sql, the planning logic need not
depend on lots of druid code and can have light-weight dependencies while
the execution part and result serde which pulls in lots of druid
dependencies can reside in separate module and calcite druid-adapter need
not depend on that module.

I think, the hypothetical case you mentioned is also worth considering, to
ease up the development process, we can consider moving calcite-druid as a
module in druid, so that we make release of both druid-sql and
calcite-adapter together.

On Wed, 7 Feb 2018 at 09:02 Gian Merlino <[email protected]> wrote:

> Hi Calcites,
>
> I would like to raise the idea of adding druid-sql (
>
> http://search.maven.org/#artifactdetails%7Cio.druid%7Cdruid-sql%7C0.11.0%7Cjar
> )
> as a dependency in Calcite's Druid adapter. It should reduce the size of
> calcite-druid substantially, since it would mostly just be calling into
> druid-sql.
>
> This has some advantages for both projects.
>
> 1) Support for new Druid features often appears in Druid SQL first. By
> embedding druid-sql, Calcite gets these new features too, without extra
> work. For example https://issues.apache.org/jira/browse/CALCITE-2170 is an
> outstanding jira to add support for Druid expressions to Calcite, but
> druid-sql already supports these. In fact it looks like some of the code in
> the proposed patch is copied from druid-sql. As another example,
> https://issues.apache.org/jira/browse/CALCITE-2077 switched table scans
> from "select" to "scan", which had been previously done in Druid SQL in
> https://github.com/druid-io/druid/pull/4751.
>
> 2) Depending on druid-sql means Calcite doesn't need to implement its own
> Druid query and result serde code. Druid already has it.
>
> 3) Focused effort on a single module rather than the split effort that we
> have today, where some developers are contributing to druid-sql and some
> are contributing to calcite-druid.
>
> 4) More test coverage for both projects, presumably.
>
> I think (3) and (4) especially would give us the opportunity to improve
> both projects much more rapidly.
>
> However, there are also some possible disadvantages.
>
> 1) druid-sql is a somewhat heavyweight module. It pulls in a lot of other
> Druid code. Calcite users may prefer a lighter weight module.
>
> 2) druid-sql's APIs are not intended to be stable, and probably never will
> be. They may break on minor releases. So updating the version of druid-sql
> in Calcite may involve tweaking how functions are called, etc. I think this
> effort should be minimal if calcite-druid is mostly just delegating to
> druid-sql.
>
> 3) druid-sql depends on calcite-core. This should usually be fine, but it
> means that if calcite-core has a breaking change, then calcite-druid cannot
> update its version of druid-sql until druid-sql first updates its version
> of calcite-core.
>
> Despite these potential difficulties, I think the potential benefit means
> this is worth exploring.
>
> Finally: a hypothetical. Why not do the other way around -- have Druid add
> calcite-druid as a dependency? The main reason is that this makes the Druid
> development process awkward when a new Druid SQL feature also requires a
> new native query feature. Today, we develop the native query and SQL sides
> together. If Druid depended on calcite-druid, then we would need to develop
> the native query side first, then release it, then update Calcite's Druid
> adapter, then pull that back into Druid. Generally, just adding an extra
> rule in druid-sql wouldn't be enough, since the sorts of changes we are
> making at this point are typically more extensive than just adjusting
> rules.
>
> Gian
>

Re: Embed druid-sql inside Calcite?

Reply via email to