[GitHub] [druid] kaijianding commented on a change in pull request #11379: improve groupBy query granularity translation with 2x query performance improve when issued from sql layer

GitBox Wed, 07 Jul 2021 02:13:36 -0700


kaijianding commented on a change in pull request #11379:
URL: https://github.com/apache/druid/pull/11379#discussion_r665191750




##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -1016,6 +1017,42 @@ public GroupByQuery toGroupByQuery()
         grouping.getSubtotals().toSubtotalsSpec(grouping.getDimensionSpecs()),
         ImmutableSortedMap.copyOf(plannerContext.getQueryContext())
     );
+    // in this case, the timestampResult optimization is not neccessary

Review comment:
       There are many reasons let me think this optimization unnecessary for 
this case.
   
   When `query.isApplyLimitPushDown()` is true, it means the limit is 
calculated on compute node first.
   
   `select floor(__time to hour),dim1 from a_table group by floor(__time to 
hour),dim1 limit 10`
   On compute node,  the row number produced with limit when granularity=ALL is 
different with the sum of row number produced with limit in each hour.
   The `limit` should be applied to the whole result rows.
   
   `select dim1,floor(__time to hour),dim2 from a_table group by 
dim1,floor(__time to hour),dim2 limit 10`
   For this even more complicated case, if we replace granularity=ALL with 
granularity=HOUR, we have to modify the Buffer Grouper logic to adjust the 
time_floor dimension value(and the position in row) in the result row to make 
it looks the same with the result row when granularity=ALL, and not only the 
value but also the sequence in rows after the `limit` is applied. I don't think 
it is worth for both logic complexity or performance gain.
   
   And for another reason, when the `limit` is pushed down to compute node, I 
think the whole computation should be fast enough, the timestampResult 
optimization is not going to be that effectively improve the total performance.
   
   And the last reason, if we don't pushdown the `limit` to compute node and 
apply this optimization and do limit on broker, I guess the total performance 
is degraded.
   
   Generally, the timestampResult optimization logic happens on broker, so I 
disable this optimization when `query.isApplyLimitPushDown()` is true




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] kaijianding commented on a change in pull request #11379: improve groupBy query granularity translation with 2x query performance improve when issued from sql layer

Reply via email to