cryptoe commented on code in PR #14994:
URL: https://github.com/apache/druid/pull/14994#discussion_r1356112683
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1693,8 +1687,9 @@ private static QueryDefinition makeQueryDefinition(
shuffleSpecFactory = ShuffleSpecFactories.singlePartition();
queryToPlan = querySpec.getQuery();
} else if (querySpec.getDestination() instanceof
DurableStorageMSQDestination) {
- // we add a final stage which generates one partition per worker.
- shuffleSpecFactory =
ShuffleSpecFactories.globalSortWithMaxPartitionCount(tuningConfig.getMaxNumWorkers());
+ shuffleSpecFactory = ShuffleSpecFactories.getGlobalSortWithTargetSize(
+ MultiStageQueryContext.getRowsPerPage(querySpec.getQuery().context())
Review Comment:
GlobalSortTargetSizeShuffleSpec enforces a limit on partition size globally
yes.
So there can be 2 cases:
1. If the last stage is group by post shuffle, then we know that each
partition will only be present on distinct worker only. Hence the page size
will control the number of rows in that partition.
2. If the last stage is scanStage, then we add a new
`QueryResultFrameProcessor` since data needs to be sorted on the boost column.
The queryResultFrameProcessor will merge the result in the same partition and
write out a single partition. Since the partition cuts on sizes are done
globally, in the controller, we would have the final partition equal to the
page size configured.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]