[GitHub] [calcite] zuozhiw commented on a change in pull request #2486: [CALCITE-4737] Add Volcano Visualizer for Debugging

GitBox Mon, 16 Aug 2021 08:51:49 -0700


zuozhiw commented on a change in pull request #2486:
URL: https://github.com/apache/calcite/pull/2486#discussion_r689656891




##########
File path: 
core/src/main/java/org/apache/calcite/plan/volcano/VolcanoRuleMatchVisualizer.java
##########
@@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.plan.volcano;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.tools.visualizer.InputExcludedRelWriter;
+import org.apache.calcite.tools.visualizer.VisualizerNodeInfo;
+import org.apache.calcite.tools.visualizer.VisualizerRuleMatchInfo;
+
+import org.apache.commons.io.IOUtils;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Charsets;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.StandardOpenOption;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+
+import static java.util.stream.Collectors.joining;
+
+/**
+ * This is tool to visualize the rule match process of the VolcanoPlanner.
+ *
+ *
+ * <p>To use the visualizer, add a listener before the VolcanoPlanner 
optimization phase.
+ * Then writes the output to a file after the optimization ends.
+ *
+ * <pre>
+ * // construct the visualizer and attach a listener to VolcanoPlanner
+ * VolcanoRuleMatchVisualizerListener visualizerListener =
+ *   new VolcanoRuleMatchVisualizerListener(volcanoPlanner);
+ * volcanoPlanner.addListener(visualizerListener);
+ *
+ * volcanoPlanner.findBestExpr();
+ *
+ * // after the optimization, adds the final best plan
+ * visualizerListener.getVisualizer().addFinalPlan();
+ * // writes the output to files
+ * visualizerListener.getVisualizer().writeToFile(outputDirectory, "");
+ * </pre>
+ */
+public class VolcanoRuleMatchVisualizer {
+
+  VolcanoPlanner volcanoPlanner;
+
+  // a sequence of ruleMatch ID to represent the order of rule match
+  List<String> ruleMatchSequence = new ArrayList<>();
+  // map of ruleMatch ID and the info, including the state snapshot at the 
time of ruleMatch
+  Map<String, VisualizerRuleMatchInfo> ruleInfoMap = new HashMap<>();
+  // map of nodeID to the ruleID it's first added
+  Map<String, String> nodeAddedInRule = new HashMap<>();
+
+  // a map of relNode ID to the actual RelNode object
+  // contains all the relNodes appear during the optimization
+  // all RelNode are immutable in Calcite, therefore only new nodes will be 
added
+  Map<String, RelNode> allNodes = new HashMap<>();
+
+  public VolcanoRuleMatchVisualizer(VolcanoPlanner volcanoPlanner) {
+    this.volcanoPlanner = volcanoPlanner;
+  }
+
+  public void addRuleMatch(String ruleCallID, Collection<? extends RelNode> 
matchedRels) {
+
+    // store the current state snapshot
+    // nodes contained in the sets
+    // and inputs of relNodes (and relSubsets)
+    Map<String, String> setLabels = new HashMap<>();
+    Map<String, String> setOriginalRel = new HashMap<>();
+    Map<String, Set<String>> nodesInSet = new HashMap<>();
+    Map<String, Set<String>> nodeInputs = new HashMap<>();
+
+    // newNodes appeared after this ruleCall
+    Set<String> newNodes = new HashSet<>();
+
+    // populate current snapshot, and fill in the allNodes map
+    volcanoPlanner.allSets.forEach(set -> {
+      String setID = "set-" + set.id;
+      String setLabel = getSetLabel(set);
+      setLabels.put(setID, setLabel);
+      setOriginalRel.put(setID, set.rel == null ? "" : 
String.valueOf(set.rel.getId()));
+
+      nodesInSet.put(setID, nodesInSet.getOrDefault(setID, new HashSet<>()));
+
+      Consumer<RelNode> addNode = rel -> {
+        String nodeID = String.valueOf(rel.getId());
+        nodesInSet.get(setID).add(nodeID);
+
+        if (!allNodes.containsKey(nodeID)) {
+          newNodes.add(nodeID);
+          allNodes.put(nodeID, rel);
+        }
+      };
+
+      Consumer<RelNode> addLink = rel -> {
+        String nodeID = String.valueOf(rel.getId());
+        nodeInputs.put(nodeID, new HashSet<>());
+        if (rel instanceof RelSubset) {
+          RelSubset relSubset = (RelSubset) rel;
+          relSubset.getRelList().stream()
+              .filter(input -> 
input.getTraitSet().equals(relSubset.getTraitSet()))
+              .forEach(input -> 
nodeInputs.get(nodeID).add(String.valueOf(input.getId())));

Review comment:
       This was a design tradeoff to make the links in a set less messy. In the 
environment I was working on, a set contains multiple subsets with related 
traits. So a relNode belongs to many subsets at the same time. 
   
   For example, supposethere are 2 subsets `S1`, `S2` (`S2` satisfies `S1`) and 
2 relnodes `R1`, `R2` with the same trait as `S2`. 
   
   Instead of creating 4 links for relNodes: `R1 -> S1`, `R1 -> S2`, `R2 -> 
S1`, `R2 -> S2`,
   I decided to draw 2 links for relNodes: `R1 -> S2` and `R2 -> S2`, and 1 
indirect link for the subsets `S2 -> S1`. So the subset satisfaction 
relationship is indirectly shown. 
   
   This way in a set with many subsets/relnodes, the links are much less messy.
   
   

##########
File path: 
core/src/main/java/org/apache/calcite/plan/volcano/VolcanoRuleMatchVisualizer.java
##########
@@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.plan.volcano;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.tools.visualizer.InputExcludedRelWriter;
+import org.apache.calcite.tools.visualizer.VisualizerNodeInfo;
+import org.apache.calcite.tools.visualizer.VisualizerRuleMatchInfo;
+
+import org.apache.commons.io.IOUtils;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Charsets;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.StandardOpenOption;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+
+import static java.util.stream.Collectors.joining;
+
+/**
+ * This is tool to visualize the rule match process of the VolcanoPlanner.
+ *
+ *
+ * <p>To use the visualizer, add a listener before the VolcanoPlanner 
optimization phase.
+ * Then writes the output to a file after the optimization ends.
+ *
+ * <pre>
+ * // construct the visualizer and attach a listener to VolcanoPlanner
+ * VolcanoRuleMatchVisualizerListener visualizerListener =
+ *   new VolcanoRuleMatchVisualizerListener(volcanoPlanner);
+ * volcanoPlanner.addListener(visualizerListener);
+ *
+ * volcanoPlanner.findBestExpr();
+ *
+ * // after the optimization, adds the final best plan
+ * visualizerListener.getVisualizer().addFinalPlan();
+ * // writes the output to files
+ * visualizerListener.getVisualizer().writeToFile(outputDirectory, "");
+ * </pre>
+ */
+public class VolcanoRuleMatchVisualizer {
+
+  VolcanoPlanner volcanoPlanner;
+
+  // a sequence of ruleMatch ID to represent the order of rule match
+  List<String> ruleMatchSequence = new ArrayList<>();
+  // map of ruleMatch ID and the info, including the state snapshot at the 
time of ruleMatch
+  Map<String, VisualizerRuleMatchInfo> ruleInfoMap = new HashMap<>();
+  // map of nodeID to the ruleID it's first added
+  Map<String, String> nodeAddedInRule = new HashMap<>();
+
+  // a map of relNode ID to the actual RelNode object
+  // contains all the relNodes appear during the optimization
+  // all RelNode are immutable in Calcite, therefore only new nodes will be 
added
+  Map<String, RelNode> allNodes = new HashMap<>();
+
+  public VolcanoRuleMatchVisualizer(VolcanoPlanner volcanoPlanner) {
+    this.volcanoPlanner = volcanoPlanner;
+  }
+
+  public void addRuleMatch(String ruleCallID, Collection<? extends RelNode> 
matchedRels) {
+
+    // store the current state snapshot
+    // nodes contained in the sets
+    // and inputs of relNodes (and relSubsets)
+    Map<String, String> setLabels = new HashMap<>();
+    Map<String, String> setOriginalRel = new HashMap<>();
+    Map<String, Set<String>> nodesInSet = new HashMap<>();
+    Map<String, Set<String>> nodeInputs = new HashMap<>();
+
+    // newNodes appeared after this ruleCall
+    Set<String> newNodes = new HashSet<>();
+
+    // populate current snapshot, and fill in the allNodes map
+    volcanoPlanner.allSets.forEach(set -> {
+      String setID = "set-" + set.id;
+      String setLabel = getSetLabel(set);
+      setLabels.put(setID, setLabel);
+      setOriginalRel.put(setID, set.rel == null ? "" : 
String.valueOf(set.rel.getId()));
+
+      nodesInSet.put(setID, nodesInSet.getOrDefault(setID, new HashSet<>()));
+
+      Consumer<RelNode> addNode = rel -> {
+        String nodeID = String.valueOf(rel.getId());
+        nodesInSet.get(setID).add(nodeID);
+
+        if (!allNodes.containsKey(nodeID)) {
+          newNodes.add(nodeID);
+          allNodes.put(nodeID, rel);
+        }
+      };
+
+      Consumer<RelNode> addLink = rel -> {
+        String nodeID = String.valueOf(rel.getId());
+        nodeInputs.put(nodeID, new HashSet<>());
+        if (rel instanceof RelSubset) {
+          RelSubset relSubset = (RelSubset) rel;
+          relSubset.getRelList().stream()
+              .filter(input -> 
input.getTraitSet().equals(relSubset.getTraitSet()))
+              .forEach(input -> 
nodeInputs.get(nodeID).add(String.valueOf(input.getId())));

Review comment:
       This was a design tradeoff to make the links in a set less messy. In the 
environment I was working on, a set contains multiple subsets with related 
traits. So a relNode belongs to many subsets at the same time. 
   
   For example, suppose there are 2 subsets `S1`, `S2` (`S2` satisfies `S1`) 
and 2 relnodes `R1`, `R2` with the same trait as `S2`. 
   
   Instead of creating 4 links for relNodes: `R1 -> S1`, `R1 -> S2`, `R2 -> 
S1`, `R2 -> S2`,
   I decided to draw 2 links for relNodes: `R1 -> S2` and `R2 -> S2`, and 1 
indirect link for the subsets `S2 -> S1`. So the subset satisfaction 
relationship is indirectly shown. 
   
   This way in a set with many subsets/relnodes, the links are much less messy.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [calcite] zuozhiw commented on a change in pull request #2486: [CALCITE-4737] Add Volcano Visualizer for Debugging

Reply via email to