amansinha100 commented on code in PR #4864:
URL: https://github.com/apache/hive/pull/4864#discussion_r1387580078


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java:
##########
@@ -755,6 +755,14 @@ private boolean checkConvertJoinSMBJoin(JoinOperator 
joinOp, OptimizeTezProcCont
         LOG.debug("External table {} found in join and also could not provide 
statistics - disabling SMB join.", sb);
         return false;
       }
+      for (Operator<?> grandParent : parentOp.getParentOperators()) {

Review Comment:
   Similar to the other other method,  pls add a non-null check:
    if (parentOp.getParentOperators() != null) {
       for ( ...) 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java:
##########
@@ -755,6 +755,14 @@ private boolean checkConvertJoinSMBJoin(JoinOperator 
joinOp, OptimizeTezProcCont
         LOG.debug("External table {} found in join and also could not provide 
statistics - disabling SMB join.", sb);
         return false;
       }
+      for (Operator<?> grandParent : parentOp.getParentOperators()) {
+        if (hasMoreGBYs(grandParent, 2)) {
+          LOG.info(
+              "We cannot convert to SMB because one of the join branches has 
more than one GBY in the same reducer");

Review Comment:
   nit: can we use the full form of GBY because this message will be at INFO 
level and not all readers are familiar with the acronym.   Also, suggest adding 
'join' after SMB . 



##########
ql/src/test/queries/clientpositive/auto_sortmerge_join_17.q:
##########
@@ -0,0 +1,22 @@
+CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;
+
+insert into tbl1_n5(key, value)
+values
+(0, 'val_0'),
+(2, 'val_2'),
+(9, 'val_9');
+
+set hive.optimize.semijoin.conversion = false;

Review Comment:
   It wasn't clear why  this config has to be set false for this test case.  If 
it is needed, can you add a comment in the test file ?



##########
ql/src/test/queries/clientpositive/auto_sortmerge_join_17.q:
##########
@@ -0,0 +1,22 @@
+CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;
+
+insert into tbl1_n5(key, value)
+values
+(0, 'val_0'),
+(2, 'val_2'),
+(9, 'val_9');
+
+set hive.optimize.semijoin.conversion = false;
+
+explain

Review Comment:
   Can we also add a negative test case where the number of group-by within a 
reducer is 1 or 0 and we expect to see the SMB join being used. 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java:
##########
@@ -857,6 +865,26 @@ private boolean checkConvertJoinSMBJoin(JoinOperator 
joinOp, OptimizeTezProcCont
     return true;
   }
 
+  private boolean hasMoreGBYs(Operator<?> start, int max) {

Review Comment:
   A brief comment for this method would be good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to