gortiz commented on code in PR #14507:
URL: https://github.com/apache/pinot/pull/14507#discussion_r1900919744
##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/physical/colocated/GreedyShuffleRewriteVisitor.java:
##########
@@ -209,24 +209,43 @@ public Set<ColocationKey>
visitMailboxSend(MailboxSendNode node, GreedyShuffleRe
boolean canSkipShuffleBasic = colocationKeyCondition(oldColocationKeys,
distributionKeys);
// If receiver is not a join-stage, then we can determine distribution
type now.
- if (!context.isJoinStage(node.getReceiverStageId())) {
+ Iterable<Integer> receiverStageIds = node.getReceiverStageIds();
+ if (noneIsJoin(receiverStageIds, context)) {
Set<ColocationKey> colocationKeys;
- if (canSkipShuffleBasic && areServersSuperset(node.getReceiverStageId(),
node.getStageId())) {
+ if (canSkipShuffleBasic && allAreSuperSet(receiverStageIds, node)) {
Review Comment:
Otherwise we cannot apply the shuffle optimization.
This means that if we find two stages that are equivalent but one can be
optimized with colocated join while the other cannot, we need to decide whether
we want to apply spool or colocated.
Which one is better? I'm not sure. Probably we will need data to understand
the difference. In theory if we don't apply spooling, we are going to end up
executing the sender stage twice. In one of them we are going to skip the
shuffle, but in the other we are going to shuffle anyway. Therefore the
asymptotic cost will be the same. If we apply spooling, the same amount of data
will be shuffled but we would end up doing less work because the sender stage
would be executed only once.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]