okumin commented on code in PR #5037:
URL: https://github.com/apache/hive/pull/5037#discussion_r1476160515


##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
   }
 
   private void link(Object o1, Object o2, boolean mayMerge) {
-
-    Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, 
Boolean>());
-    keySet.add(o1);
-    keySet.add(o2);
-    keySet.add(getKeyFor(o1));
-    keySet.add(getKeyFor(o2));
-
-    Set<EquivGroup> mGroups = Collections.newSetFromMap(new 
IdentityHashMap<EquivGroup, Boolean>());
-
-    for (Object object : keySet) {
-      EquivGroup group = objectMap.get(object);
-      if (group != null) {
-        mGroups.add(group);
-      }
+    // Caches signatures on the first access. A signature of an Operator could 
change as optimizers could mutate it,
+    // keeping its semantics. The current implementation caches the signature 
before optimizations so that we can
+    // link Operators with their signatures at consistent timing.
+    registerSignature(o1);
+    registerSignature(o2);

Review Comment:
   > Assuming that we do not allow implicit linking via signatures I don't know 
what's the point of registering the signature at this stage
   
   That's because some optimization can mutate Operators, changing their true 
signatures. [One example I remember is 
TableScanPPD](https://github.com/apache/hive/blob/rel/release-4.0.0-beta-1/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java#L414).
 I'm not 100% sure which signatures should be used in that case, but it sounds 
more reasonable and predictable for me to use ones before optimization than 
ones in the middle of optimization. I would say Operators should not be mutable 
for the signing purpose, but it is a too fundamental change.
   
   I will make a reply to @kgyrtkirk in another thread as I guess those 
comments were written before reading my gist.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to