agavra commented on code in PR #9448:
URL: https://github.com/apache/pinot/pull/9448#discussion_r982943252


##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/rules/PinotJoinExchangeNodeInsertRule.java:
##########
@@ -65,14 +62,14 @@ public void onMatch(RelOptRuleCall call) {
 
     RelNode leftExchange;
     RelNode rightExchange;
-    List<RelHint> hints = join.getHints();
-    if (hints.contains(PinotRelationalHints.USE_BROADCAST_DISTRIBUTE)) {
-      // TODO: determine which side should be the broadcast table based on 
table metadata
-      // TODO: support SINGLETON exchange if the non-broadcast table size is 
small enough to stay local.
-      leftExchange = LogicalExchange.create(leftInput, 
RelDistributions.RANDOM_DISTRIBUTED);
+    JoinInfo joinInfo = join.analyzeCondition();

Review Comment:
   nit: we can remove the TODO above, right?



##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/rules/PinotQueryRuleSets.java:
##########
@@ -37,8 +37,8 @@ private PinotQueryRuleSets() {
           EnumerableRules.ENUMERABLE_PROJECT_RULE, 
EnumerableRules.ENUMERABLE_SORT_RULE,
           EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE,
 
-          // push a filter into a join, replaced CoreRules.FILTER_INTO_JOIN 
with special config
-          PinotFilterIntoJoinRule.INSTANCE,
+          // push a filter into a join
+          CoreRules.FILTER_INTO_JOIN,

Review Comment:
   an aside, how do we choose which rules to use here? I notice we don't 
include, for example, `SORT_JOIN_TRANSPOSE` which can push sorts down in the 
case of `LEFT/RIGHT OUTER JOIN` but that isn't here?



##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/rules/PinotJoinExchangeNodeInsertRule.java:
##########
@@ -65,14 +62,14 @@ public void onMatch(RelOptRuleCall call) {
 
     RelNode leftExchange;
     RelNode rightExchange;
-    List<RelHint> hints = join.getHints();
-    if (hints.contains(PinotRelationalHints.USE_BROADCAST_DISTRIBUTE)) {
-      // TODO: determine which side should be the broadcast table based on 
table metadata
-      // TODO: support SINGLETON exchange if the non-broadcast table size is 
small enough to stay local.
-      leftExchange = LogicalExchange.create(leftInput, 
RelDistributions.RANDOM_DISTRIBUTED);
+    JoinInfo joinInfo = join.analyzeCondition();
+
+    if (joinInfo.leftKeys.isEmpty()) {
+      // when there's no JOIN key, use broadcast.
+      leftExchange = LogicalExchange.create(leftInput, 
RelDistributions.SINGLETON);

Review Comment:
   just to make sure that I'm understanding this correctly - using `SINGLETON` 
on the left and `BROADCAST_DISTRIBUTED` on the right would mean that each row 
in the right table is broadcasted to every node that hosts a segment of the 
left table, and the left tables remain in place (essentially not exchanged at 
all)?
   
   > when there's no join key, use broadcast assuming the right table is 
smaller.
   
   I noticed that there's a `computeSelfCost` on `RelNode` which wires down to 
basically getting a row count. Is there a way for us to hook that in to make 
sure that we're actually choosing the smaller table? Obviously this can be left 
for a follow-up, just wanted to learn :) 



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -132,7 +139,11 @@ private TransferableBlock 
buildJoinedDataBlock(TransferableBlock leftBlock)
         List<Object[]> hashCollection = _broadcastHashTable.getOrDefault(
             _leftKeySelector.computeHash(leftRow), Collections.emptyList());
         for (Object[] rightRow : hashCollection) {
-          rows.add(joinRow(leftRow, rightRow));
+          Object[] resultRow = joinRow(leftRow, rightRow);
+          if (_joinClauseEvaluators.isEmpty() || 
_joinClauseEvaluators.stream().allMatch(

Review Comment:
   it looks like `_joinClauseEvaluators.isEmpty()` is true if there are only 
non-equijoin conditions - I didn't fully look into calcite code, but how does 
it handle a situation with both equi and non-equi joins (e.g. `JOIN ON a.col1 = 
b.col1 AND a.col2 > b.col2`)?
   
   can we add a test for that scenario?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to