[PR] Fixing nested group by query with order by in outer query (druid)

via GitHub Mon, 23 Oct 2023 16:43:02 -0700


somu-imply opened a new pull request, #15237:
URL: https://github.com/apache/druid/pull/15237


   Consider this query
   ```
   with t AS (SELECT distinct(m2) as mo, APPROX_COUNT_DISTINCT(m1) / ( 
TIMESTAMPDIFF(DAY,min( __time ), CURRENT_TIMESTAMP  ) + 0.0000000001) as 
trend_score
   FROM "foo"
   GROUP BY 1
   ORDER BY trend_score DESC
   LIMIT 2)
   select mo, (MAX(trend_score)) from t
   where mo > 2
   GROUP BY 1 
   ORDER BY 2 DESC
   ```
   This currently fails to plan with the stack trace
   
   
   ```
   Caused by: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There 
are not enough rules to produce a node with desired properties: 
convention=DRUID, sort=[1 DESC].
   Missing conversion is LogicalSort[convention: NONE -> DRUID]
   There is 1 empty subset: rel#126:RelSubset#7.DRUID.[1 DESC], the relevant 
part of the original plan is as follows
   124:LogicalSort(sort0=[$1], dir0=[DESC], fetch=[1001:BIGINT])
     120:LogicalFilter(subset=[rel#128:RelSubset#5.NONE.[]], condition=[>($0, 
2)])
       118:LogicalSort(subset=[rel#119:RelSubset#4.NONE.[1 DESC]], sort0=[$1], 
dir0=[DESC], fetch=[2])
         116:LogicalProject(subset=[rel#117:RelSubset#3.NONE.[]], mo=[$0], 
trend_score=[/($1, +(CAST(/INT(Reinterpret(-(2023-10-17 23:46:28.328, $2)), 
86400000)):INTEGER NOT NULL, 1E-10:DECIMAL(11, 10)))])
           114:LogicalAggregate(subset=[rel#115:RelSubset#2.NONE.[]], 
group=[{0}], agg#0=[APPROX_COUNT_DISTINCT($1)], agg#1=[MIN($2)])
             112:LogicalProject(subset=[rel#113:RelSubset#1.NONE.[]], mo=[$5], 
m1=[$4], __time=[$0])
               63:LogicalTableScan(subset=[rel#111:RelSubset#0.NONE.[]], 
table=[[druid, foo]])
   
   Root: rel#126:RelSubset#7.DRUID.[1 DESC]
   Original rel:
   LogicalSort(subset=[rel#126:RelSubset#7.DRUID.[1 DESC]], sort0=[$1], 
dir0=[DESC], fetch=[1001:BIGINT]): rowcount = 1.0, cumulative cost = {1.0 rows, 
20.0 cpu, 0.0 io}, id = 124
     LogicalFilter(subset=[rel#128:RelSubset#5.NONE.[]], condition=[>($0, 2)]): 
rowcount = 1.0, cumulative cost = {1.0 rows, 2.0 cpu, 0.0 io}, id = 120
       LogicalSort(subset=[rel#119:RelSubset#4.NONE.[1 DESC]], sort0=[$1], 
dir0=[DESC], fetch=[2]): rowcount = 2.0, cumulative cost = {2.0 rows, 200.0 
cpu, 0.0 io}, id = 118
         LogicalProject(subset=[rel#117:RelSubset#3.NONE.[]], mo=[$0], 
trend_score=[/($1, +(CAST(/INT(Reinterpret(-(2023-10-17 23:46:28.328, $2)), 
86400000)):INTEGER NOT NULL, 1E-10:DECIMAL(11, 10)))]): rowcount = 10.0, 
cumulative cost = {10.0 rows, 20.0 cpu, 0.0 io}, id = 116
           LogicalAggregate(subset=[rel#115:RelSubset#2.NONE.[]], group=[{0}], 
agg#0=[APPROX_COUNT_DISTINCT($1)], agg#1=[MIN($2)]): rowcount = 10.0, 
cumulative cost = {12.5 rows, 0.0 cpu, 0.0 io}, id = 114
             LogicalProject(subset=[rel#113:RelSubset#1.NONE.[]], mo=[$5], 
m1=[$4], __time=[$0]): rowcount = 100.0, cumulative cost = {100.0 rows, 300.0 
cpu, 0.0 io}, id = 112
               LogicalTableScan(subset=[rel#111:RelSubset#0.NONE.[]], 
table=[[druid, foo]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 
cpu, 0.0 io}, id = 63
   
           at 
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:718)
 ~[calcite-core-1.35.0.jar:1.35.0]
           at 
org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:391) 
~[calcite-core-1.35.0.jar:1.35.0]
           at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:533)
 ~[calcite-core-1.35.0.jar:1.35.0]
           at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:317) 
~[calcite-core-1.35.0.jar:1.35.0]
           at 
org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:337) 
~[calcite-core-1.35.0.jar:1.35.0]
           at 
org.apache.druid.sql.calcite.planner.CalcitePlanner.transform(CalcitePlanner.java:452)
 ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
           at 
org.apache.druid.sql.calcite.planner.QueryHandler.planWithDruidConvention(QueryHandler.java:590)
 ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
           at 
org.apache.druid.sql.calcite.planner.QueryHandler$SelectHandler.planForDruid(QueryHandler.java:738)
 ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
           at 
org.apache.druid.sql.calcite.planner.QueryHandler.plan(QueryHandler.java:220) 
~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
           ... 84 more
   ```
   In this PR we fix planning by removing the Aggregate remove rule which plans 
these queries as scan over group by with order by on a non-time column which 
Druid does not support.
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Fixing nested group by query with order by in outer query (druid)

Reply via email to