cryptoe commented on code in PR #14994:
URL: https://github.com/apache/druid/pull/14994#discussion_r1356112683


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1693,8 +1687,9 @@ private static QueryDefinition makeQueryDefinition(
       shuffleSpecFactory = ShuffleSpecFactories.singlePartition();
       queryToPlan = querySpec.getQuery();
     } else if (querySpec.getDestination() instanceof 
DurableStorageMSQDestination) {
-      // we add a final stage which generates one partition per worker.
-      shuffleSpecFactory = 
ShuffleSpecFactories.globalSortWithMaxPartitionCount(tuningConfig.getMaxNumWorkers());
+      shuffleSpecFactory = ShuffleSpecFactories.getGlobalSortWithTargetSize(
+          MultiStageQueryContext.getRowsPerPage(querySpec.getQuery().context())

Review Comment:
   GlobalSortTargetSizeShuffleSpec enforces a limit on partition size globally 
yes. 
   
   So there can be 2 cases: 
   1. If the last stage is group by post shuffle, then we know that each 
partition will only be present on distinct worker only. Hence the page size 
will control the number of rows in that partition. 
   
   2. If the last stage is scanStage, then we add a new 
`QueryResultFrameProcessor` since data needs to be sorted on the boost column. 
The queryResultFrameProcessor will merge the result in the same partition and 
write out a single partition. Since the partition cuts on sizes are done 
globally, in the controller, we would have the final partition equal to the 
page size configured. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to