Hi Stamatis,

Just now getting back to this, but thank you very much for the detailed
response! At least it seems like I was on the right track, and you've given
me some things to consider. To my knowledge, the druid-sql implementation
was partially based on/inspired by the calcite Druid adapter, but with
subsequent refactorings has started to drift . The biggest difference is
that it produces native java objects rather than a json request, but there
are still a few recognizable bits.

If I have the chance, I'll see if I can work out a solution to some of the
validator errors while I work on this implementation, and contribute back
upstream.

Thanks again!

Clint


On Thu, Jan 24, 2019 at 12:46 AM Stamatis Zampetakis <[email protected]>
wrote:

> Hi Clint,
>
> Thanks for opening this discussion.
>
> Probably you are already aware of this but the Calcite project has already
> an adapter for Druid [1] from which
> you can take possibly some ideas. Having said that, I am not sure if
> dynamic parameters are handled somehow by DruidQuery since I couldn't find
> something relevant.
>
> From what I can see DruidRel also follows the Bindable convention which
> means that you have access to the DataContext. During planning and the
> transformation to
> the Druid convention you could introduce some placeholders for the dynamic
> parameters that are get populated from the DataCotnext during the execution
> of the query
> (so very similar to how parameters are handled in the Bindable convention).
>
> Other than that the visitors approach also seems reasonable to me although
> it means that you will somehow copy the whole RelNode tree for each
> execution of the query
> with different parameters.
>
> Indeed after binding the parameters there might exist simplifications that
> can be performed in the RelNode tree but it may not be a very good idea to
> do them. Using parameters
> is beneficial for having always the same execution plan so not have to
> optimize and compile the same query multiple times.
>
> I don't think it is required to implement CalcitePrepare if you don't need
> to. It depends very much on what are your needs.
>
> The validator exceptions you observed are indeed annoying. It should be
> possible to improve this behavior by working on the way the validator
> infers unknown types [2].
>
> Best,
> Stamatis
>
> [1] https://github.com/apache/calcite/tree/master/druid
> [2]
>
> https://github.com/apache/calcite/blob/e804521beeec30eccb5ae3a29620f818d0de8427/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L1725
>
> Στις Σάβ, 5 Ιαν 2019 στις 2:59 π.μ., ο/η Clint Wylie <[email protected]
> >
> έγραψε:
>
> > Hi all,
> >
> > I've been investigating/experimenting with adding support for dynamic
> > parameters in Druid (
> > https://github.com/apache/incubator-druid/tree/master/sql) and I've come
> > up
> > with a few questions and was also wondering if there are any
> > recommendations or best practices on this matter?
> >
> > My first question I guess is "how to do?". In my digging so far it
> appears
> > there are basically 2 sides necessary.
> >
> > The first is how to handle binding the values of the dynamic parameters.
> > Druid's sql planner has 2 planning paths, both paths parse the sql into
> an
> > SqlNode, validate it, and turn it into an initial rel. From there it
> tries
> > to "plan with druid convention" which transforms with druid rules which
> > build query fragments for the eventual transformation into a native druid
> > query(ies) for pushdown, and a fallback path that uses bindable
> convention
> > to support non druid queries against information schema or system table
> (or
> > druid queries that can't plan into native queries with a generic scan
> query
> > I guess). The bindable convention path is a non-issue and in my
> experiments
> > can just use 'DataContext' and the 'bind' function. But the `DruidRel`
> > transformation runs into issues because everything is currently
> blissfully
> > unaware of the existence of 'RexDynamicParam', and so the transformation
> is
> > expecting the nodes to be 'RexLiteral' so that the values can be
> extracted
> > and the native filters and aggregators and such be constructed. Is the
> most
> > legitimate way to handle this to make all of these parts able to handle
> > dynamic parameters?
> >
> > Alternative/sneaky ways I can think of to handle binding values for druid
> > expressions seems to be (assuming the parameters all have values to bind
> at
> > planning time) to either use an SqlVisitor to walk the sql tree looking
> for
> > SqlDynamicParam and constructing an SqlLiteral of the value, or a
> RelVistor
> > to walk the rel tree and replace RexDynamicParam with a RexLiteral of the
> > value. Both of these appear to work, with some caveats, since the dynamic
> > parameters are just eliminated before the transformation which happens as
> > if everything was a literal in the first place.
> >
> > The SqlVisitor approach is tricky because constructing an SqlLiteral
> seems
> > tough since SqlTypeName's createSqlLiteral doesn't appear to support the
> > full range of JDBC types that might appear, so this logic would need to
> be
> > fairly sophisticated.
> >
> > The RelVisitor approach is a more complicated tree to walk, but the
> actual
> > replace easier since the RexBuilder literal creation function doesn't
> have
> > the type issue that sql literals do. Walking at the rel level also
> appears
> > that it can miss out on some rel collapses that happen when creating the
> > rel from the validated sql, so I would need a mechanism to simplify the
> rel
> > tree after replacing the dynamic params with literals, so that if they
> were
> > equivalent to any other relational expressions they could be recognized
> as
> > the same expression.
> >
> > SELECT
> >   dim1,
> >   COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) as ratio
> > FROM druid.foo
> > GROUP BY dim1
> > HAVING COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) = 1
> >
> > if param 1 = 'a' and param 2 = 'a' the `COUNT(*) FILTER(WHERE dim2 <>
> > ?)/COUNT(*)` are not treated as the same as were they literals in the
> first
> > place.
> >
> > I think a similar simplification would also have to happen if I did add
> > support for RexDynamicParam to appear during the DruidRel transformation
> as
> > well, because it would be effectively late binding the values at that
> point
> > after all of the transformations have happened at least afaict.
> >
> > Are the visitor based approaches hacky or missing something important? It
> > seems much less disruptive from my point of view since all of the druid
> > conversion wouldn't have to know how to handle a dynamic parameter, but
> > curious mostly if there are problems with doing this.
> >
> >
> > The 2nd major part i think is adding support for an actual 'prepare' that
> > collects the types of the dynamic parameters encountered in the sql to
> > support the PrepareStatement on the jdbc side of things. Currently the
> > druid statements just run 'plan' in the 'prepare' method, which doesn't
> > collect these types and maybe does more work than it should. In my
> sketches
> > so far I've modified 'DruidPlanner' to have a 'prepare' function that
> > resembles some of the functionality in 'CalcitePrepareImpl' but without
> > doing a full implementation of the 'CalcitePrepare' interface. This
> appears
> > sufficient to construct the list of AvaticaParameters, and functions as
> > expected in my tests so far. Is this an adequate way to do this or is it
> > necessary to implement 'CalcitePrepare' or somehow use
> 'CalcitePrepareImpl'
> > instead?
> >
> >
> > I've also noticed in a lot of my experiments that I run into SqlValidator
> > exceptions that appear to be related to being unable to infer the type
> of a
> > dynamic parameter, usually either something like:
> >
> >     SELECT SUBSTRING(dim2, ?, ?) FROM druid.foo LIMIT ?
> >     => Illegal use of dynamic parameter
> > or
> >     SELECT
> >       CASE 'foo'
> >       WHEN ? THEN SUM(cnt) / ?
> >       WHEN ? THEN SUM(m1) / ?
> >       END
> >     FROM foo
> >     => Cannot apply '/' to arguments of type '<BIGINT> / <UNKNOWN>'
> >
> > which can be mitigated by doing something like `CAST(? as INT)` giving
> the
> > operand an explicit type, but that seems like an unfortunate workaround.
> Is
> > there a way to handle this?
> >
> > Any pointers would be appreciated, thanks!
> >
> > Cheers,
> > Clint
> >
>

Reply via email to