[GitHub] [druid] jihoonson commented on a change in pull request #11379: improve groupBy query granularity translation with 2x query performance improve when issued from sql layer

GitBox Thu, 01 Jul 2021 22:21:29 -0700


jihoonson commented on a change in pull request #11379:
URL: https://github.com/apache/druid/pull/11379#discussion_r662743259




##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##########
@@ -977,6 +977,33 @@ public GroupByQuery toGroupByQuery()
       // Cannot handle zero or negative limits.
       return null;
     }
+    Map<String, Object> theContext = plannerContext.getQueryContext();
+
+    Granularity queryGranularity = null;
+
+    if (!grouping.getDimensions().isEmpty()) {
+      for (DimensionExpression dimensionExpression : grouping.getDimensions()) 
{
+        Granularity granularity = Expressions.toQueryGranularity(
+            dimensionExpression.getDruidExpression(),
+            plannerContext.getExprMacroTable()
+        );
+        if (granularity == null) {
+          continue;
+        }
+        if (queryGranularity != null) {
+          // group by more than one timestamp_floor
+          // eg: group by timestamp_floor(__time to 
DAY),timestamp_floor(__time, to HOUR)
+          theContext = plannerContext.getQueryContext();
+          break;
+        }
+        queryGranularity = granularity;
+        int timestampDimensionIndexInDimensions = 
grouping.getDimensions().indexOf(dimensionExpression);
+        theContext = new HashMap<>(plannerContext.getQueryContext());
+        theContext.put(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD, 
dimensionExpression.getOutputName());
+        theContext.put(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD_GRANULARITY, 
queryGranularity);

Review comment:
       The Druid native query is the execution plan of the sql query. To me, it 
makes more sense to compute the granularity and dimensions and set them in the 
groupBy query in the sql planner, because the query engine should process 
whatever the given query plan as it is. This way, we can have one single place 
that optimizes the query plan which is easier to maintain and improve. Suppose 
we have multiple different places that modify the given query separately. It 
will be very hard to understand what query will be actually executed.

##########
File path: 
processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java
##########
@@ -285,6 +342,34 @@ public boolean doMergeResults(final GroupByQuery query)
     }
   }
 
+  private void fixResultRowWithTimestampResultField(

Review comment:
       Ah I see. It's not only about nested groupBys, but nested queries in 
general when those queries are created by the sql planner.

##########
File path: 
processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java
##########
@@ -251,9 +296,26 @@ public boolean doMergeResults(final GroupByQuery query)
     // pushed-down subquery (CTX_KEY_EXECUTING_NESTED_QUERY).
 
     if (!query.getContextBoolean(CTX_KEY_OUTERMOST, true)
-        || query.getPostAggregatorSpecs().isEmpty()
         || 
query.getContextBoolean(GroupByQueryConfig.CTX_KEY_EXECUTING_NESTED_QUERY, 
false)) {
       return mergedResults;
+    } else if (query.getPostAggregatorSpecs().isEmpty()) {
+      if (!hasTimestampResultField) {
+        return mergedResults;
+      }
+      return Sequences.map(
+          mergedResults,
+          row -> {
+            final ResultRow resultRow = 
ResultRow.create(query.getResultRowPostAggregatorStart());

Review comment:
       ```suggestion
               final ResultRow resultRow = 
ResultRow.create(query.getResultRowSizeWithoutPostAggregators());
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson commented on a change in pull request #11379: improve groupBy query granularity translation with 2x query performance improve when issued from sql layer

Reply via email to