Hi all,
I've been investigating/experimenting with adding support for dynamic
parameters in Druid (
https://github.com/apache/incubator-druid/tree/master/sql) and I've come up
with a few questions and was also wondering if there are any
recommendations or best practices on this matter?
My first question I guess is "how to do?". In my digging so far it appears
there are basically 2 sides necessary.
The first is how to handle binding the values of the dynamic parameters.
Druid's sql planner has 2 planning paths, both paths parse the sql into an
SqlNode, validate it, and turn it into an initial rel. From there it tries
to "plan with druid convention" which transforms with druid rules which
build query fragments for the eventual transformation into a native druid
query(ies) for pushdown, and a fallback path that uses bindable convention
to support non druid queries against information schema or system table (or
druid queries that can't plan into native queries with a generic scan query
I guess). The bindable convention path is a non-issue and in my experiments
can just use 'DataContext' and the 'bind' function. But the `DruidRel`
transformation runs into issues because everything is currently blissfully
unaware of the existence of 'RexDynamicParam', and so the transformation is
expecting the nodes to be 'RexLiteral' so that the values can be extracted
and the native filters and aggregators and such be constructed. Is the most
legitimate way to handle this to make all of these parts able to handle
dynamic parameters?
Alternative/sneaky ways I can think of to handle binding values for druid
expressions seems to be (assuming the parameters all have values to bind at
planning time) to either use an SqlVisitor to walk the sql tree looking for
SqlDynamicParam and constructing an SqlLiteral of the value, or a RelVistor
to walk the rel tree and replace RexDynamicParam with a RexLiteral of the
value. Both of these appear to work, with some caveats, since the dynamic
parameters are just eliminated before the transformation which happens as
if everything was a literal in the first place.
The SqlVisitor approach is tricky because constructing an SqlLiteral seems
tough since SqlTypeName's createSqlLiteral doesn't appear to support the
full range of JDBC types that might appear, so this logic would need to be
fairly sophisticated.
The RelVisitor approach is a more complicated tree to walk, but the actual
replace easier since the RexBuilder literal creation function doesn't have
the type issue that sql literals do. Walking at the rel level also appears
that it can miss out on some rel collapses that happen when creating the
rel from the validated sql, so I would need a mechanism to simplify the rel
tree after replacing the dynamic params with literals, so that if they were
equivalent to any other relational expressions they could be recognized as
the same expression.
SELECT
dim1,
COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) as ratio
FROM druid.foo
GROUP BY dim1
HAVING COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) = 1
if param 1 = 'a' and param 2 = 'a' the `COUNT(*) FILTER(WHERE dim2 <>
?)/COUNT(*)` are not treated as the same as were they literals in the first
place.
I think a similar simplification would also have to happen if I did add
support for RexDynamicParam to appear during the DruidRel transformation as
well, because it would be effectively late binding the values at that point
after all of the transformations have happened at least afaict.
Are the visitor based approaches hacky or missing something important? It
seems much less disruptive from my point of view since all of the druid
conversion wouldn't have to know how to handle a dynamic parameter, but
curious mostly if there are problems with doing this.
The 2nd major part i think is adding support for an actual 'prepare' that
collects the types of the dynamic parameters encountered in the sql to
support the PrepareStatement on the jdbc side of things. Currently the
druid statements just run 'plan' in the 'prepare' method, which doesn't
collect these types and maybe does more work than it should. In my sketches
so far I've modified 'DruidPlanner' to have a 'prepare' function that
resembles some of the functionality in 'CalcitePrepareImpl' but without
doing a full implementation of the 'CalcitePrepare' interface. This appears
sufficient to construct the list of AvaticaParameters, and functions as
expected in my tests so far. Is this an adequate way to do this or is it
necessary to implement 'CalcitePrepare' or somehow use 'CalcitePrepareImpl'
instead?
I've also noticed in a lot of my experiments that I run into SqlValidator
exceptions that appear to be related to being unable to infer the type of a
dynamic parameter, usually either something like:
SELECT SUBSTRING(dim2, ?, ?) FROM druid.foo LIMIT ?
=> Illegal use of dynamic parameter
or
SELECT
CASE 'foo'
WHEN ? THEN SUM(cnt) / ?
WHEN ? THEN SUM(m1) / ?
END
FROM foo
=> Cannot apply '/' to arguments of type '<BIGINT> / <UNKNOWN>'
which can be mitigated by doing something like `CAST(? as INT)` giving the
operand an explicit type, but that seems like an unfortunate workaround. Is
there a way to handle this?
Any pointers would be appreciated, thanks!
Cheers,
Clint