kgyrtkirk commented on code in PR #5037:
URL: https://github.com/apache/hive/pull/5037#discussion_r1475951785


##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
   }
 
   private void link(Object o1, Object o2, boolean mayMerge) {
-
-    Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, 
Boolean>());
-    keySet.add(o1);
-    keySet.add(o2);
-    keySet.add(getKeyFor(o1));
-    keySet.add(getKeyFor(o2));
-
-    Set<EquivGroup> mGroups = Collections.newSetFromMap(new 
IdentityHashMap<EquivGroup, Boolean>());
-
-    for (Object object : keySet) {
-      EquivGroup group = objectMap.get(object);
-      if (group != null) {
-        mGroups.add(group);
-      }
+    // Caches signatures on the first access. A signature of an Operator could 
change as optimizers could mutate it,
+    // keeping its semantics. The current implementation caches the signature 
before optimizations so that we can
+    // link Operators with their signatures at consistent timing.
+    registerSignature(o1);
+    registerSignature(o2);
+    final EquivGroup linkedGroup1 = objectMap.get(o1);
+    final EquivGroup linkedGroup2 = objectMap.get(o2);
+
+    if (linkedGroup1 == null && linkedGroup2 == null) {
+      final EquivGroup group = new EquivGroup();
+      group.add(o1);
+      group.add(o2);
+      groups.add(group);
+      return;
     }
-    if (mGroups.size() > 1) {
-      if (!mayMerge) {
-        throw new RuntimeException("equivalence mapping violation");
-      }
-      EquivGroup newGrp = new EquivGroup();
-      newGrp.add(o1);
-      newGrp.add(o2);
-      for (EquivGroup g : mGroups) {
-        for (Object o : g.members) {
-          newGrp.add(o);
-        }
-      }
-      groups.add(newGrp);
-      groups.removeAll(mGroups);
-    } else {
-      EquivGroup targetGroup = mGroups.isEmpty() ? new EquivGroup() : 
mGroups.iterator().next();
-      groups.add(targetGroup);
-      targetGroup.add(o1);
-      targetGroup.add(o2);
+    if (linkedGroup1 == null || linkedGroup2 == null || linkedGroup1 == 
linkedGroup2) {

Review Comment:
   what's all these changes do better than the old?



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
   }
 
   private void link(Object o1, Object o2, boolean mayMerge) {
-
-    Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, 
Boolean>());
-    keySet.add(o1);
-    keySet.add(o2);
-    keySet.add(getKeyFor(o1));
-    keySet.add(getKeyFor(o2));
-
-    Set<EquivGroup> mGroups = Collections.newSetFromMap(new 
IdentityHashMap<EquivGroup, Boolean>());
-
-    for (Object object : keySet) {
-      EquivGroup group = objectMap.get(object);
-      if (group != null) {
-        mGroups.add(group);
-      }
+    // Caches signatures on the first access. A signature of an Operator could 
change as optimizers could mutate it,
+    // keeping its semantics. The current implementation caches the signature 
before optimizations so that we can
+    // link Operators with their signatures at consistent timing.

Review Comment:
   please remove this comment and don't pre-heat caches unless it gives real 
benefits
   the purpose of the cache is not something like this ; but to prevent 
`O(N^2)` computation ( and also to reduce storage size by making it more 
serialize friendly...)



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
   }
 
   private void link(Object o1, Object o2, boolean mayMerge) {
-
-    Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, 
Boolean>());
-    keySet.add(o1);
-    keySet.add(o2);
-    keySet.add(getKeyFor(o1));
-    keySet.add(getKeyFor(o2));
-
-    Set<EquivGroup> mGroups = Collections.newSetFromMap(new 
IdentityHashMap<EquivGroup, Boolean>());
-
-    for (Object object : keySet) {
-      EquivGroup group = objectMap.get(object);
-      if (group != null) {
-        mGroups.add(group);
-      }
+    // Caches signatures on the first access. A signature of an Operator could 
change as optimizers could mutate it,
+    // keeping its semantics. The current implementation caches the signature 
before optimizations so that we can
+    // link Operators with their signatures at consistent timing.
+    registerSignature(o1);
+    registerSignature(o2);

Review Comment:
   > This change reverts that logic and takes signatures out of the equation 
for determining equivalent groups.
   
   haven't read all of it - but if thats true - please don't do that...
   looking at the volume of changes - it must be changing something 
fundamental...
   
   what's the issue - is it an equiv  violation? can you put the stacktrace 
somewhere?
   I think that should be fixed without altering this part - it might be a 
missing link; or a missing signature annotation somewhere - you should not 
change this at all... 
   
   all these things could enable to back-map runtime statistict up to the 
calcite join planning phase...but I think that feature was never 
completed/enabled



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -286,8 +271,13 @@ public Iterator<EquivGroup> iterateGroups() {
   }
 
   public OpTreeSignature getSignatureOf(Operator<?> op) {
-    OpTreeSignature sig = signatureCache.getSignature(op);
-    return sig;
+    return signatureCache.getSignature(op);
+  }
+
+  private void registerSignature(Object o) {
+    if (o instanceof Operator) {
+      getSignatureOf((Operator<?>) o);
+    }
   }

Review Comment:
   whats the point of this method? the cache will compute it if needed...what 
the point of pre-registering - if that's needed its not a *cache* anymore!



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
   }
 
   private void link(Object o1, Object o2, boolean mayMerge) {
-
-    Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, 
Boolean>());
-    keySet.add(o1);
-    keySet.add(o2);
-    keySet.add(getKeyFor(o1));
-    keySet.add(getKeyFor(o2));
-
-    Set<EquivGroup> mGroups = Collections.newSetFromMap(new 
IdentityHashMap<EquivGroup, Boolean>());
-
-    for (Object object : keySet) {
-      EquivGroup group = objectMap.get(object);
-      if (group != null) {
-        mGroups.add(group);
-      }
+    // Caches signatures on the first access. A signature of an Operator could 
change as optimizers could mutate it,
+    // keeping its semantics. The current implementation caches the signature 
before optimizations so that we can
+    // link Operators with their signatures at consistent timing.
+    registerSignature(o1);
+    registerSignature(o2);
+    final EquivGroup linkedGroup1 = objectMap.get(o1);
+    final EquivGroup linkedGroup2 = objectMap.get(o2);
+
+    if (linkedGroup1 == null && linkedGroup2 == null) {
+      final EquivGroup group = new EquivGroup();
+      group.add(o1);
+      group.add(o2);
+      groups.add(group);
+      return;
     }
-    if (mGroups.size() > 1) {
-      if (!mayMerge) {
-        throw new RuntimeException("equivalence mapping violation");
-      }
-      EquivGroup newGrp = new EquivGroup();
-      newGrp.add(o1);
-      newGrp.add(o2);
-      for (EquivGroup g : mGroups) {
-        for (Object o : g.members) {
-          newGrp.add(o);
-        }
-      }
-      groups.add(newGrp);
-      groups.removeAll(mGroups);
-    } else {
-      EquivGroup targetGroup = mGroups.isEmpty() ? new EquivGroup() : 
mGroups.iterator().next();
-      groups.add(targetGroup);
-      targetGroup.add(o1);
-      targetGroup.add(o2);
+    if (linkedGroup1 == null || linkedGroup2 == null || linkedGroup1 == 
linkedGroup2) {
+      final EquivGroup group = linkedGroup1 != null ? linkedGroup1 : 
linkedGroup2;
+      group.add(o1);
+      group.add(o2);
+      return;
     }
 
+    if (!mayMerge) {
+      throw new RuntimeException(String.format(
+          "Failed to link %s and %s. This error mostly means a bug of Hive",

Review Comment:
   I find this error message pretty cloudy on what the issue is... 
   
   I think the sentecte "Equivalence mapping violation" should be there - as 
that's what the problem is...we have 2 groups; not allowed to put them into the 
same equiv group -> big trouble...
   
   



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
   }
 
   private void link(Object o1, Object o2, boolean mayMerge) {
-
-    Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, 
Boolean>());
-    keySet.add(o1);
-    keySet.add(o2);
-    keySet.add(getKeyFor(o1));
-    keySet.add(getKeyFor(o2));
-
-    Set<EquivGroup> mGroups = Collections.newSetFromMap(new 
IdentityHashMap<EquivGroup, Boolean>());
-
-    for (Object object : keySet) {
-      EquivGroup group = objectMap.get(object);
-      if (group != null) {
-        mGroups.add(group);
-      }
+    // Caches signatures on the first access. A signature of an Operator could 
change as optimizers could mutate it,
+    // keeping its semantics. The current implementation caches the signature 
before optimizations so that we can
+    // link Operators with their signatures at consistent timing.
+    registerSignature(o1);
+    registerSignature(o2);
+    final EquivGroup linkedGroup1 = objectMap.get(o1);
+    final EquivGroup linkedGroup2 = objectMap.get(o2);

Review Comment:
   these changes will weaken this stuff - so that 2 OPs which have the same 
signature will be allowed -> that's not good....
   its not causing much issues yet - because almost all existing operators have 
good signatures; but if that degrades...runtime stats will be applied 
incorrectly



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
   }
 
   private void link(Object o1, Object o2, boolean mayMerge) {
-
-    Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, 
Boolean>());
-    keySet.add(o1);
-    keySet.add(o2);
-    keySet.add(getKeyFor(o1));
-    keySet.add(getKeyFor(o2));
-
-    Set<EquivGroup> mGroups = Collections.newSetFromMap(new 
IdentityHashMap<EquivGroup, Boolean>());
-
-    for (Object object : keySet) {
-      EquivGroup group = objectMap.get(object);
-      if (group != null) {
-        mGroups.add(group);
-      }
+    // Caches signatures on the first access. A signature of an Operator could 
change as optimizers could mutate it,
+    // keeping its semantics. The current implementation caches the signature 
before optimizations so that we can
+    // link Operators with their signatures at consistent timing.
+    registerSignature(o1);
+    registerSignature(o2);

Review Comment:
   to get a failing test for these changes you would need 2 ops which have 
conflicting signatures stored into the metastore ; loaded back and they might 
get applied incorrectly...
   possibly pretty hard to do...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to