zabetak commented on a change in pull request #2891:
URL: https://github.com/apache/hive/pull/2891#discussion_r772195126



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java
##########
@@ -92,6 +94,7 @@
 
   // A field because we cannot multi-inherit.
   transient InterruptibleProcessing interruptChecker;
+  transient boolean shortcutExtensionRows;

Review comment:
       Minor: should we use `unmatched` everywhere we use `extension`?
   
   "Unmatched rows of an outer join" sounds more natural than "Extension rows 
of an outer join".

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -1836,6 +1836,8 @@ private static void populateLlapDaemonVarsSet(Set<String> 
llapDaemonVarsSetLocal
     HIVEALIAS("hive.alias", "", ""),
     HIVEMAPSIDEAGGREGATE("hive.map.aggr", true, "Whether to use map-side 
aggregation in Hive Group By queries"),
     HIVEGROUPBYSKEW("hive.groupby.skewindata", false, "Whether there is skew 
in data to optimize group by queries"),
+    HIVE_JOIN_SHORTCUT_EXTENSIONROWS("hive.join.shortcut.extensionrows", true,
+        "Enables to shortcut processing of known filtered rows in merge 
joins."),

Review comment:
       I see your point but I still think that it would be better to avoid 
toggles when it comes to correctness issues. I guess it is not the first time 
we are doing this so I am not gonna push back on this but it shouldn't be the 
norm. I am leaving the final judgement up to you.
   
   If it goes in I would add some extra information that it is for "internal 
use only" and changing the value can lead to incorrect results.

##########
File path: ql/src/test/results/clientpositive/llap/join_1to1.q.out
##########
@@ -597,22 +592,10 @@ NULL      NULL    NULL    10      10010   66
 NULL   NULL    NULL    25      10025   66
 NULL   NULL    NULL    30      10030   88
 NULL   NULL    NULL    35      10035   88
-NULL   NULL    NULL    40      10040   66
 NULL   NULL    NULL    40      10040   88
-NULL   NULL    NULL    40      10040   88
-NULL   NULL    NULL    50      10050   66
-NULL   NULL    NULL    50      10050   66
-NULL   NULL    NULL    50      10050   66
-NULL   NULL    NULL    50      10050   66
-NULL   NULL    NULL    50      10050   88
 NULL   NULL    NULL    50      10050   88
-NULL   NULL    NULL    50      10050   88
-NULL   NULL    NULL    70      10040   88
 NULL   NULL    NULL    70      10040   88
 NULL   NULL    NULL    70      10040   88

Review comment:
       I didn't verify that all the changes in this file are correct. Should I 
do it or you went over them already?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java
##########
@@ -302,7 +322,47 @@ public void process(Object row, int tag) throws 
HiveException {
 
     assert !nextKeyGroup;
     candidateStorage[tag].addRow(value);
+  }
+
+  private void emitExtensionRow(int tag, List<Object> value) throws 
HiveException {
+    extensionStorage[tag].addRow(value);
+    for (byte i = 0; i < order.length; i++) {
+      if (i == tag) {
+        storage[i] = extensionStorage[i];
+      } else {
+        putDummyOrEmpty(i);
+      }
+    }
+    checkAndGenObject();
+    extensionStorage[tag].clearRows();
+  }
 
+  /**
+   * Decides if the actual row must be an extension row.
+   *
+   * Extension rows are those which are not part of the inner-join.
+   * May not correctly identify all extension rows - but will remove trivially 
filtered ones.

Review comment:
       Since you have examples that do not work, maybe it would be better to 
write "The current implementation, does not correctly identify all..." and put 
a reference to a follow up JIRA highlighting that there is an implementation 
limitation.

##########
File path: ql/src/test/results/clientpositive/llap/join_1to1.q.out
##########
@@ -373,9 +373,6 @@ NULL        NULL    NULL    25      10025   66
 NULL   NULL    NULL    30      10030   88
 NULL   NULL    NULL    35      10035   88
 NULL   NULL    NULL    40      10040   88
-NULL   NULL    NULL    50      10050   66
-NULL   NULL    NULL    50      10050   66
-NULL   NULL    NULL    50      10050   88

Review comment:
       I am not a big fan of the the `src` and similar tables but if people 
validated the results against another DBMS I can leave with it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to