zabetak commented on code in PR #5452: URL: https://github.com/apache/hive/pull/5452#discussion_r1795599493
########## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ########## @@ -809,6 +809,10 @@ public static enum ConfVars { HIVE_IN_TEST_REPL("hive.in.repl.test", false, "internal usage only, true in replication test mode", true), HIVE_IN_TEST_IDE("hive.in.ide.test", false, "internal usage only, true if test running in ide", true), + HIVE_IN_TEST_PLANMAPPER_STRICT_VALIDATION("hive.in.test.planmapper.strict.validation", false, + "internal use only, whether to raise an error when unexpected links are found. We ignore equivalence mapping " + + "violation because it introduces only minor problems. But we want to strictly check it in qtest so that we " + + "can prevent further degradations"), Review Comment: I am unsure about this newly introduced property. It may prevent future "degradation" but it puts an extra burden on contributors and reviewers. As a reviewer, when I see people setting on/off properties in tests I always ask why? And if the answer is "because there is a problem" then I will ask for more information. The contributor will try to understand what's going on and eventually it will find this ticket and argue that this feature is broken so its best to disable the property for their use-case (as out of scope). The reviewer may agree or push back and say that this is blocking the PR from merging. An exception is by definition serious, a warning on the other hand is not so much and I feel that for this use-case it should have been a warning in the first-place. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java: ########## @@ -991,35 +987,6 @@ private void removeSemiJoinIfNoStats(OptimizeTezProcContext procCtx) ogw.startWalking(topNodes, null); } - private static class CollectAll implements SemanticNodeProcessor { Review Comment: Why are we touching this? ########## ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java: ########## @@ -32,21 +32,34 @@ import java.util.Objects; import java.util.Set; +import java.util.concurrent.atomic.AtomicBoolean; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.conf.HiveConf.ConfVars; import org.apache.hadoop.hive.ql.exec.Operator; import org.apache.hadoop.hive.ql.optimizer.signature.OpTreeSignature; import org.apache.hadoop.hive.ql.optimizer.signature.OpTreeSignatureFactory; import com.google.common.annotations.VisibleForTesting; import com.google.common.collect.Sets; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** * Enables to connect related objects to eachother. * * Most importantly it aids to connect Operators to OperatorStats and probably RelNodes. */ public class PlanMapper { + private static final Logger LOG = LoggerFactory.getLogger(PlanMapper.class); - Set<EquivGroup> groups = new HashSet<>(); - private Map<Object, EquivGroup> objectMap = new CompositeMap<>(OpTreeSignature.class, AuxOpTreeSignature.class); + private final Set<EquivGroup> groups = new HashSet<>(); + private final Map<Object, EquivGroup> objectMap = new CompositeMap<>(OpTreeSignature.class, AuxOpTreeSignature.class); + private final boolean failsWithIllegalLink; + private final AtomicBoolean isBroken = new AtomicBoolean(false); Review Comment: Why AtomicBoolean? The whole class does not look thread-safe. ########## ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java: ########## @@ -279,7 +295,6 @@ public <T> T lookup(Class<T> clazz, Object key) { return all.get(0); } - @VisibleForTesting Review Comment: Why removing the annotation? ########## ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java: ########## @@ -256,20 +277,15 @@ public <T> List<T> getAll(Class<T> clazz) { return ret; } - public void runMapper(GroupTransformer mapper) { - for (EquivGroup equivGroup : groups) { - mapper.map(equivGroup); - } - } - - public <T> List<T> lookupAll(Class<T> clazz, Object key) { + private <T> List<T> lookupAll(Class<T> clazz, Object key) { EquivGroup group = objectMap.get(key); if (group == null) { throw new NoSuchElementException(Objects.toString(key)); } return group.getAll(clazz); } + @VisibleForTesting Review Comment: Why adding the annotation? ########## ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java: ########## @@ -217,7 +230,11 @@ private void link(Object o1, Object o2, boolean mayMerge) { } if (mGroups.size() > 1) { if (!mayMerge) { - throw new RuntimeException("equivalence mapping violation"); + LOG.warn("Illegally linking {} and {}", o1, o2); + if (failsWithIllegalLink) { + throw new RuntimeException("equivalence mapping violation"); + } + isBroken.set(true); Review Comment: It seems to me that if we raise a warning and return we don't really break the `PlanMapper` since the "invalid" expression is not added. This would allow the mapper to be used based on the existing content. Is it necessary to invalidate everything that's already in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org