zabetak commented on a change in pull request #2891:
URL: https://github.com/apache/hive/pull/2891#discussion_r771938082
##########
File path: ql/src/test/results/clientpositive/llap/join_1to1.q.out
##########
@@ -373,9 +373,6 @@ NULL NULL NULL 25 10025 66
NULL NULL NULL 30 10030 88
NULL NULL NULL 35 10035 88
NULL NULL NULL 40 10040 88
-NULL NULL NULL 50 10050 66
-NULL NULL NULL 50 10050 66
-NULL NULL NULL 50 10050 88
Review comment:
Indeed these rows shouldn't be there. It's a pity that people update
q.out files without paying attention if the resulting changes are correct. We
could have found this bug much earlier if people verified the results against
another DBMS as well.
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java
##########
@@ -302,7 +322,47 @@ public void process(Object row, int tag) throws
HiveException {
assert !nextKeyGroup;
candidateStorage[tag].addRow(value);
+ }
+
+ private void emitExtensionRow(int tag, List<Object> value) throws
HiveException {
+ extensionStorage[tag].addRow(value);
+ for (byte i = 0; i < order.length; i++) {
+ if (i == tag) {
+ storage[i] = extensionStorage[i];
+ } else {
+ putDummyOrEmpty(i);
+ }
+ }
+ checkAndGenObject();
+ extensionStorage[tag].clearRows();
+ }
+ /**
+ * Decides if the actual row must be an extension row.
+ *
+ * Extension rows are those which are not part of the inner-join.
+ * May not correctly identify all extension rows - but will remove trivially
filtered ones.
Review comment:
`May not correctly...` It is not clear what this means. Possibly an
example could help clarify this better.
What happens in the case where it does not identify an extension row? Wrong
results?
##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -1836,6 +1836,8 @@ private static void populateLlapDaemonVarsSet(Set<String>
llapDaemonVarsSetLocal
HIVEALIAS("hive.alias", "", ""),
HIVEMAPSIDEAGGREGATE("hive.map.aggr", true, "Whether to use map-side
aggregation in Hive Group By queries"),
HIVEGROUPBYSKEW("hive.groupby.skewindata", false, "Whether there is skew
in data to optimize group by queries"),
+ HIVE_JOIN_SHORTCUT_EXTENSIONROWS("hive.join.shortcut.extensionrows", true,
+ "Enables to shortcut processing of known filtered rows in merge
joins."),
Review comment:
If my understanding is correct if this is set to false the outer join
produces incorrect results. I don't think it is a good idea to have a property
which determines the correctness of the results.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]