jihoonson commented on a change in pull request #11379:
URL: https://github.com/apache/druid/pull/11379#discussion_r663315846
##########
File path:
processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java
##########
@@ -213,7 +216,48 @@ public boolean doMergeResults(final GroupByQuery query)
context.put("finalize", false);
context.put(GroupByQueryConfig.CTX_KEY_STRATEGY,
GroupByStrategySelector.STRATEGY_V2);
context.put(CTX_KEY_OUTERMOST, false);
- if (query.getUniversalTimestamp() != null) {
+ Map<String, Object> timestampFieldContext =
GroupByQueryHelper.findTimestampResultField(query);
+ context.putAll(timestampFieldContext);
+
+ Granularity granularity = query.getGranularity();
+ List<DimensionSpec> dimensionSpecs = query.getDimensions();
+ final String timestampResultField = (String)
timestampFieldContext.get(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD);
+ final boolean hasTimestampResultField = timestampResultField != null
+ &&
query.getContextBoolean(CTX_KEY_OUTERMOST, true);
+ int timestampResultFieldIndex = 0;
+ if (hasTimestampResultField) {
+ // sql like "group by city_id,time_floor(__time to day)",
+ // the original translated query is granularity=all and dimensions:[d0,
d1]
+ // the better plan is granularity=day and dimensions:[d0]
+ // but the ResultRow structure is changed from [d0, d1] to [__time, d0]
+ // this structure should be fixed as [d0, d1] (actually it is [d0,
__time]) before postAggs are called
+ final Granularity timestampResultFieldGranularity
+ = (Granularity)
timestampFieldContext.get(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD_GRANULARITY);
+ dimensionSpecs =
+ query.getDimensions()
+ .stream()
+ .filter(dimensionSpec ->
!dimensionSpec.getOutputName().equals(timestampResultField))
+ .collect(Collectors.toList());
+ granularity = timestampResultFieldGranularity;
+ // when timestampResultField is the last dimension, should set
sortByDimsFirst=true,
+ // otherwise the downstream is sorted by row's timestamp first which
makes the final ordering not as expected
+ timestampResultFieldIndex = (int)
timestampFieldContext.get(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD_INDEX);
+ if (!query.getContextSortByDimsFirst() && timestampResultFieldIndex ==
query.getDimensions().size() - 1) {
+ context.put(GroupByQuery.CTX_KEY_SORT_BY_DIMS_FIRST, true);
+ }
+ // when timestampResultField is the first dimension and
sortByDimsFirst=true,
+ // it is actually equals to sortByDimsFirst=false
+ if (query.getContextSortByDimsFirst() && timestampResultFieldIndex == 0)
{
+ context.put(GroupByQuery.CTX_KEY_SORT_BY_DIMS_FIRST, false);
+ }
+ // when hasTimestampResultField=true and timestampResultField is neither
first nor last dimension,
+ // the DefaultLimitSpec will always do the reordering
+ }
Review comment:
Hmm, my apologies if my comment was not clear. What I meant is,
something similar to your original approach seems better to me because the sql
planner makes all decisions and the groupBy engine does whatever it is told to
do. More precisely, I'm thinking of something that
`DruidQuery.toGroupByQuery()` determines the granularity and dimensions. The
granularity seems possible to be directly passed to the constructor of
`GroupByQuery`, but for the dimensions, we can pass
`grouping.getDimensionSpecs()` because it makes the query planning easier as
you mentioned. Instead, the sql planner passes `timestampResultField` via
queryContext, so that the groupBy engine can adjust dimensions correctly before
it starts query execution. This way, all decisions can be made in the sql
planner, but groupBy engine just performs one query rewriting per the decision
of the planner. I think this is better because 1) we will have only one brain
that is responsible for query optimization and 2) the `e
xplain plan` will return the same native query as what will be actually
executed. `fudgeTimestamp` is rather similar to this approach because the
groupBy engine doesn't do anything smart, but just does whatever it is told to
do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]