Akshat-Jain commented on code in PR #16326:
URL: https://github.com/apache/druid/pull/16326#discussion_r1833647434
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/groupby/GroupByQueryKit.java:
##########
@@ -148,7 +148,7 @@ public QueryDefinition makeQueryDefinition(
partitionBoost = true;
} else {
- shuffleSpecFactoryPreAggregation = doLimitOrOffset
+ shuffleSpecFactoryPreAggregation = doLimitOrOffset ||
intermediateClusterBy.isEmpty()
Review Comment:
Was this a bug that surfaced only when we had only summary rows?
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/groupby/GroupByPostShuffleFrameProcessor.java:
##########
@@ -263,6 +267,41 @@ private void writeCurrentFrameIfNeeded() throws IOException
outputChannel.write(new FrameWithPartition(frame,
FrameWithPartition.NO_PARTITION));
frameWriter.close();
frameWriter = null;
+ outputRows += frame.numRows();
+ }
+ }
+
+ /**
+ * Generate a frame containing a single row with default values of all
aggregations if needed. This method uses
+ * {@link GroupingEngine#summaryRowPreconditions(GroupByQuery)} to determine
if such an operation is needed.
+ *
+ * Note that in cases where {@link
GroupingEngine#summaryRowPreconditions(GroupByQuery)} returns true, the
+ * preceding {@link GroupByPreShuffleFrameProcessorFactory} stage would use
an empty {@link ClusterBy}. Therefore,
+ * there would only be a single output partition of the prior stage, and
therefore a single instance of
+ * this processor. This ensures that only a single null-aggregations row is
generated for the entire stage.
+ */
+ private void writeEmptyAggregationsFrameIfNeeded() throws IOException
+ {
+ // Check isIncludeNullResultRow, which is populated by the SQL planner
when doing a query like GROUP BY ().
Review Comment:
Which part of this logic corresponds to `isIncludeNullResultRow` condition?
Trying to understand if it maps to something, or if this is an outdated comment
by any chance.
##########
processing/src/main/java/org/apache/druid/query/groupby/GroupingEngine.java:
##########
@@ -985,7 +988,7 @@ private static boolean summaryRowPreconditions(GroupByQuery
query)
return false;
}
}
- if (!query.getDimensions().isEmpty()) {
+ if (!query.getDimensions().isEmpty() || query.hasDroppedDimensions()) {
Review Comment:
I'm trying to understand why this change is required. Can you maybe suggest
some example query which gives unexpected output if we don't have this change?
I tried a few queries with/without this change but unable to find a query
that shows the difference.
I do see that
`CalciteQueryTest#testGroupByWithFilterMatchingNothingWithGroupByLiteral` fails
without this change:
```
2024-11-08T04:50:34,690 INFO [main]
org.apache.druid.sql.calcite.BaseCalciteQueryTest - -- Expected results --
----
2024-11-08T04:50:34,690 INFO [main]
org.apache.druid.sql.calcite.BaseCalciteQueryTest - -- Actual results --
new Object[]{0L, null}
----
```
But I'm unable to get the same behavior when I try running a similar query
locally:
```sql
SELECT COUNT(*) FROM wikipedia WHERE cityName = 'foobar' GROUP BY 'dummy'
```
##########
processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryQueryToolChest.java:
##########
@@ -231,32 +232,17 @@ public Comparator<Result<TimeseriesResultValue>>
createResultComparator(Query<Re
return ResultGranularTimestampComparator.create(query.getGranularity(),
((TimeseriesQuery) query).isDescending());
}
- private Result<TimeseriesResultValue>
getNullTimeseriesResultValue(TimeseriesQuery query)
+ /**
+ * Returns a {@link TimeseriesResultValue} that corresponds to an empty-set
aggregation, which is used in situations
+ * where we want to return a single result representing "nothing was
aggregated".
+ */
+ public Result<TimeseriesResultValue>
getEmptyTimeseriesResultValue(TimeseriesQuery query)
Review Comment:
Nit: Can be made `protected` instead of `public`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]