zabetak commented on a change in pull request #2891:
URL: https://github.com/apache/hive/pull/2891#discussion_r771938082



##########
File path: ql/src/test/results/clientpositive/llap/join_1to1.q.out
##########
@@ -373,9 +373,6 @@ NULL        NULL    NULL    25      10025   66
 NULL   NULL    NULL    30      10030   88
 NULL   NULL    NULL    35      10035   88
 NULL   NULL    NULL    40      10040   88
-NULL   NULL    NULL    50      10050   66
-NULL   NULL    NULL    50      10050   66
-NULL   NULL    NULL    50      10050   88

Review comment:
       Indeed these rows shouldn't be there. It's a pity that people update 
q.out files without paying attention if the resulting changes are correct. We 
could have found this bug much earlier if people verified the results against 
another DBMS as well.

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java
##########
@@ -302,7 +322,47 @@ public void process(Object row, int tag) throws 
HiveException {
 
     assert !nextKeyGroup;
     candidateStorage[tag].addRow(value);
+  }
+
+  private void emitExtensionRow(int tag, List<Object> value) throws 
HiveException {
+    extensionStorage[tag].addRow(value);
+    for (byte i = 0; i < order.length; i++) {
+      if (i == tag) {
+        storage[i] = extensionStorage[i];
+      } else {
+        putDummyOrEmpty(i);
+      }
+    }
+    checkAndGenObject();
+    extensionStorage[tag].clearRows();
+  }
 
+  /**
+   * Decides if the actual row must be an extension row.
+   *
+   * Extension rows are those which are not part of the inner-join.
+   * May not correctly identify all extension rows - but will remove trivially 
filtered ones.

Review comment:
       `May not correctly...` It is not clear what this means. Possibly an 
example could help clarify this better.
   What happens in the case where it does not identify an extension row? Wrong 
results?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -1836,6 +1836,8 @@ private static void populateLlapDaemonVarsSet(Set<String> 
llapDaemonVarsSetLocal
     HIVEALIAS("hive.alias", "", ""),
     HIVEMAPSIDEAGGREGATE("hive.map.aggr", true, "Whether to use map-side 
aggregation in Hive Group By queries"),
     HIVEGROUPBYSKEW("hive.groupby.skewindata", false, "Whether there is skew 
in data to optimize group by queries"),
+    HIVE_JOIN_SHORTCUT_EXTENSIONROWS("hive.join.shortcut.extensionrows", true,
+        "Enables to shortcut processing of known filtered rows in merge 
joins."),

Review comment:
       If my understanding is correct if this is set to false the outer join 
produces incorrect results. I don't think it is a good idea to have a property 
which determines the correctness of the results.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to