gianm commented on code in PR #13506:
URL: https://github.com/apache/druid/pull/13506#discussion_r1127470521
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/kernel/StageDefinition.java:
##########
@@ -288,37 +322,72 @@ public boolean mustGatherResultKeyStatistics()
return shuffleSpec != null && shuffleSpec.needsStatistics();
}
- public Either<Long, ClusterByPartitions> generatePartitionsForShuffle(
+ public Either<Long, ClusterByPartitions>
generatePartitionBoundariesForShuffle(
@Nullable ClusterByStatisticsCollector collector
)
{
if (shuffleSpec == null) {
throw new ISE("No shuffle for stage[%d]", getStageNumber());
+ } else if (shuffleSpec.kind() != ShuffleKind.GLOBAL_SORT) {
+ throw new ISE(
+ "Shuffle of kind [%s] cannot generate partition boundaries for
stage[%d]",
+ shuffleSpec.kind(),
+ getStageNumber()
+ );
} else if (mustGatherResultKeyStatistics() && collector == null) {
throw new ISE("Statistics required, but not gathered for stage[%d]",
getStageNumber());
} else if (!mustGatherResultKeyStatistics() && collector != null) {
throw new ISE("Statistics gathered, but not required for stage[%d]",
getStageNumber());
} else {
- return shuffleSpec.generatePartitions(collector, MAX_PARTITIONS);
+ return shuffleSpec.generatePartitionsForGlobalSort(collector,
MAX_PARTITIONS);
}
}
public ClusterByStatisticsCollector createResultKeyStatisticsCollector(final
int maxRetainedBytes)
{
if (!mustGatherResultKeyStatistics()) {
- throw new ISE("No statistics needed");
+ throw new ISE("No statistics needed for stage[%d]", getStageNumber());
}
return ClusterByStatisticsCollectorImpl.create(
- shuffleSpec.getClusterBy(),
+ shuffleSpec.clusterBy(),
signature,
maxRetainedBytes,
PARTITION_STATS_MAX_BUCKETS,
- shuffleSpec.doesAggregateByClusterKey(),
+ shuffleSpec.doesAggregate(),
shuffleCheckHasMultipleValues
);
Review Comment:
It's specifically talking about gathering result key statistics for the
purposes of generating partitions for the global sort. (It's populating
sketches to figure out range cut points.) The user doesn't enter into it
really. I updated some javadoc to hopefully be more clear.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]