Hi Stamatis, Just now getting back to this, but thank you very much for the detailed response! At least it seems like I was on the right track, and you've given me some things to consider. To my knowledge, the druid-sql implementation was partially based on/inspired by the calcite Druid adapter, but with subsequent refactorings has started to drift . The biggest difference is that it produces native java objects rather than a json request, but there are still a few recognizable bits.
If I have the chance, I'll see if I can work out a solution to some of the validator errors while I work on this implementation, and contribute back upstream. Thanks again! Clint On Thu, Jan 24, 2019 at 12:46 AM Stamatis Zampetakis <[email protected]> wrote: > Hi Clint, > > Thanks for opening this discussion. > > Probably you are already aware of this but the Calcite project has already > an adapter for Druid [1] from which > you can take possibly some ideas. Having said that, I am not sure if > dynamic parameters are handled somehow by DruidQuery since I couldn't find > something relevant. > > From what I can see DruidRel also follows the Bindable convention which > means that you have access to the DataContext. During planning and the > transformation to > the Druid convention you could introduce some placeholders for the dynamic > parameters that are get populated from the DataCotnext during the execution > of the query > (so very similar to how parameters are handled in the Bindable convention). > > Other than that the visitors approach also seems reasonable to me although > it means that you will somehow copy the whole RelNode tree for each > execution of the query > with different parameters. > > Indeed after binding the parameters there might exist simplifications that > can be performed in the RelNode tree but it may not be a very good idea to > do them. Using parameters > is beneficial for having always the same execution plan so not have to > optimize and compile the same query multiple times. > > I don't think it is required to implement CalcitePrepare if you don't need > to. It depends very much on what are your needs. > > The validator exceptions you observed are indeed annoying. It should be > possible to improve this behavior by working on the way the validator > infers unknown types [2]. > > Best, > Stamatis > > [1] https://github.com/apache/calcite/tree/master/druid > [2] > > https://github.com/apache/calcite/blob/e804521beeec30eccb5ae3a29620f818d0de8427/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L1725 > > Στις Σάβ, 5 Ιαν 2019 στις 2:59 π.μ., ο/η Clint Wylie <[email protected] > > > έγραψε: > > > Hi all, > > > > I've been investigating/experimenting with adding support for dynamic > > parameters in Druid ( > > https://github.com/apache/incubator-druid/tree/master/sql) and I've come > > up > > with a few questions and was also wondering if there are any > > recommendations or best practices on this matter? > > > > My first question I guess is "how to do?". In my digging so far it > appears > > there are basically 2 sides necessary. > > > > The first is how to handle binding the values of the dynamic parameters. > > Druid's sql planner has 2 planning paths, both paths parse the sql into > an > > SqlNode, validate it, and turn it into an initial rel. From there it > tries > > to "plan with druid convention" which transforms with druid rules which > > build query fragments for the eventual transformation into a native druid > > query(ies) for pushdown, and a fallback path that uses bindable > convention > > to support non druid queries against information schema or system table > (or > > druid queries that can't plan into native queries with a generic scan > query > > I guess). The bindable convention path is a non-issue and in my > experiments > > can just use 'DataContext' and the 'bind' function. But the `DruidRel` > > transformation runs into issues because everything is currently > blissfully > > unaware of the existence of 'RexDynamicParam', and so the transformation > is > > expecting the nodes to be 'RexLiteral' so that the values can be > extracted > > and the native filters and aggregators and such be constructed. Is the > most > > legitimate way to handle this to make all of these parts able to handle > > dynamic parameters? > > > > Alternative/sneaky ways I can think of to handle binding values for druid > > expressions seems to be (assuming the parameters all have values to bind > at > > planning time) to either use an SqlVisitor to walk the sql tree looking > for > > SqlDynamicParam and constructing an SqlLiteral of the value, or a > RelVistor > > to walk the rel tree and replace RexDynamicParam with a RexLiteral of the > > value. Both of these appear to work, with some caveats, since the dynamic > > parameters are just eliminated before the transformation which happens as > > if everything was a literal in the first place. > > > > The SqlVisitor approach is tricky because constructing an SqlLiteral > seems > > tough since SqlTypeName's createSqlLiteral doesn't appear to support the > > full range of JDBC types that might appear, so this logic would need to > be > > fairly sophisticated. > > > > The RelVisitor approach is a more complicated tree to walk, but the > actual > > replace easier since the RexBuilder literal creation function doesn't > have > > the type issue that sql literals do. Walking at the rel level also > appears > > that it can miss out on some rel collapses that happen when creating the > > rel from the validated sql, so I would need a mechanism to simplify the > rel > > tree after replacing the dynamic params with literals, so that if they > were > > equivalent to any other relational expressions they could be recognized > as > > the same expression. > > > > SELECT > > dim1, > > COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) as ratio > > FROM druid.foo > > GROUP BY dim1 > > HAVING COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) = 1 > > > > if param 1 = 'a' and param 2 = 'a' the `COUNT(*) FILTER(WHERE dim2 <> > > ?)/COUNT(*)` are not treated as the same as were they literals in the > first > > place. > > > > I think a similar simplification would also have to happen if I did add > > support for RexDynamicParam to appear during the DruidRel transformation > as > > well, because it would be effectively late binding the values at that > point > > after all of the transformations have happened at least afaict. > > > > Are the visitor based approaches hacky or missing something important? It > > seems much less disruptive from my point of view since all of the druid > > conversion wouldn't have to know how to handle a dynamic parameter, but > > curious mostly if there are problems with doing this. > > > > > > The 2nd major part i think is adding support for an actual 'prepare' that > > collects the types of the dynamic parameters encountered in the sql to > > support the PrepareStatement on the jdbc side of things. Currently the > > druid statements just run 'plan' in the 'prepare' method, which doesn't > > collect these types and maybe does more work than it should. In my > sketches > > so far I've modified 'DruidPlanner' to have a 'prepare' function that > > resembles some of the functionality in 'CalcitePrepareImpl' but without > > doing a full implementation of the 'CalcitePrepare' interface. This > appears > > sufficient to construct the list of AvaticaParameters, and functions as > > expected in my tests so far. Is this an adequate way to do this or is it > > necessary to implement 'CalcitePrepare' or somehow use > 'CalcitePrepareImpl' > > instead? > > > > > > I've also noticed in a lot of my experiments that I run into SqlValidator > > exceptions that appear to be related to being unable to infer the type > of a > > dynamic parameter, usually either something like: > > > > SELECT SUBSTRING(dim2, ?, ?) FROM druid.foo LIMIT ? > > => Illegal use of dynamic parameter > > or > > SELECT > > CASE 'foo' > > WHEN ? THEN SUM(cnt) / ? > > WHEN ? THEN SUM(m1) / ? > > END > > FROM foo > > => Cannot apply '/' to arguments of type '<BIGINT> / <UNKNOWN>' > > > > which can be mitigated by doing something like `CAST(? as INT)` giving > the > > operand an explicit type, but that seems like an unfortunate workaround. > Is > > there a way to handle this? > > > > Any pointers would be appreciated, thanks! > > > > Cheers, > > Clint > > >
