okumin commented on code in PR #5037:
URL: https://github.com/apache/hive/pull/5037#discussion_r1476260894
##########
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java:
##########
@@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
}
private void link(Object o1, Object o2, boolean mayMerge) {
-
- Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object,
Boolean>());
- keySet.add(o1);
- keySet.add(o2);
- keySet.add(getKeyFor(o1));
- keySet.add(getKeyFor(o2));
-
- Set<EquivGroup> mGroups = Collections.newSetFromMap(new
IdentityHashMap<EquivGroup, Boolean>());
-
- for (Object object : keySet) {
- EquivGroup group = objectMap.get(object);
- if (group != null) {
- mGroups.add(group);
- }
+ // Caches signatures on the first access. A signature of an Operator could
change as optimizers could mutate it,
+ // keeping its semantics. The current implementation caches the signature
before optimizations so that we can
+ // link Operators with their signatures at consistent timing.
+ registerSignature(o1);
+ registerSignature(o2);
Review Comment:
I missed mentioning what are affected by such optimizations. Many test cases
on the following revision failed with `equivalence mapping violation`.
https://github.com/apache/hive/pull/5037/commits/217b26db7eee95a0d25725df985adc79705a80d6
That's because it tries to LINK a pre-optimized operator with a
post-optimized operator.
1. Before optimization
- Operator A has a signature SA and is grouped as GA
- Operator B has a signature SB and is grouped as GB
2. AuxSignatureLinker doesn't MERGE GA and GB as A and B have different
signatures. Note that non aux signatures are not cached here
3. After optimization
- Operator B is optimized, and its signature becomes SA
4. StatsRulePsrocFactory tries to LINK GA and GB as A and B share the same
signature SA -> equivalence mapping violation
Potentially, GA and GB are actually mergeable though the current
implementation doesn't do so.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]