[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

ASF GitHub Bot (Jira) Wed, 28 Oct 2020 11:39:20 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=505853&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505853
 ]


ASF GitHub Bot logged work on HIVE-24241:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Oct/20 18:38
            Start Date: 28/Oct/20 18:38
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1562:
URL: https://github.com/apache/hive/pull/1562#discussion_r513671527



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorGraph.java
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.AppMasterEventOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule.DiGraph;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.DynamicPruningEventDesc;
+
+import com.google.common.collect.Sets;
+
+public class OperatorGraph {

Review comment:
       @jcamachor this is the checker class I was talking about - right now it 
builds on top of the basic `digraph` class I've introduce some time ago in 
`PointLookupOptimizer`
   
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorGraph.java
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.AppMasterEventOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule.DiGraph;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.DynamicPruningEventDesc;
+
+import com.google.common.collect.Sets;
+
+public class OperatorGraph {
+
+  /**
+   * A directed graph extended with support to check dag property.
+   */
+  static class DagGraph<V, E> extends DiGraph<V, E> {

Review comment:
       we can definetly roll our own graph representation; however sometimes I 
would feel that it would make things easier to have access to basic graph 
algorithms (for example to do a  topological order walk/etc) there is a small 
library called [jgrapht](https://jgrapht.org/) (EPL 2.0 license - I think it 
will be okay) which could be utilized for these kind of things
   
   @jcamachor what do you think about pulling in the jgrapht lib and removing 
the makeshift digraph classes?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorGraph.java
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer;
+
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.AppMasterEventOperator;
+import org.apache.hadoop.hive.ql.exec.Operator;
+import org.apache.hadoop.hive.ql.exec.ReduceSinkOperator;
+import org.apache.hadoop.hive.ql.exec.TableScanOperator;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePointLookupOptimizerRule.DiGraph;
+import org.apache.hadoop.hive.ql.parse.ParseContext;
+import org.apache.hadoop.hive.ql.parse.SemiJoinBranchInfo;
+import org.apache.hadoop.hive.ql.plan.DynamicPruningEventDesc;
+
+import com.google.common.collect.Sets;
+
+public class OperatorGraph {
+
+  /**
+   * A directed graph extended with support to check dag property.
+   */
+  static class DagGraph<V, E> extends DiGraph<V, E> {
+
+    static class DagNode<V, E> extends Node<V, E> {
+      int dagIdx = 0;
+      public DagNode(V v) {
+        super(v);
+      }
+
+      @Override
+      public void addEdge(Edge<V, E> edge) {
+        if (edge.s == this) {
+          DagNode<V, E> t = (DagNode<V, E>) edge.t;
+          ensureDagIdxAtLeast(t.dagIdx + 1);
+          if (t.dagIdx > dagIdx) {
+            throw new IllegalArgumentException("adding this edge would violate 
dag properties");
+          }
+        }
+        super.addEdge(edge);
+      }
+
+      void ensureDagIdxAtLeast(int min) {
+        if (dagIdx >= min) {
+          return;
+        }
+        dagIdx = min;
+        for (Edge<V, E> e : edges) {
+          if(e.t == this) {
+            DagNode<V, E> s = (DagNode<V, E>) e.s;
+            s.ensureDagIdxAtLeast(min + 1);
+          }
+        }
+      }
+    }
+
+    @Override
+    protected Node<V, E> newNode(V s) {
+      return new DagNode<V, E>(s);
+    }
+  }
+
+  DagGraph<Operator<?>, OpEdge> g;
+
+  enum EdgeType {
+    FLOW, SEMIJOIN, DPP, TEST,
+  }
+
+  static class OpEdge {
+
+    private final EdgeType et;
+    private final int index;
+
+    public OpEdge(EdgeType et) {
+      this(et, 0);
+    }
+
+    public OpEdge(EdgeType et, int index) {
+      this.et = et;
+      this.index = index;
+    }
+
+  }
+
+
+  Map<Operator<?>, Cluster> nodeCluster = new HashMap<>();
+
+  public class Cluster {
+
+    Set<Operator<?>> members = new LinkedHashSet<>();
+
+    public void merge(Cluster o) {
+      for (Operator<?> node : o.members) {
+        add(node);
+      }
+      o.members.clear();
+    }
+
+    public void add(Operator<?> curr) {
+      nodeCluster.put(curr, this);
+      members.add(curr);
+    }
+
+  }
+
+
+  public OperatorGraph(ParseContext pctx) {
+    g = new DagGraph<Operator<?>, OperatorGraph.OpEdge>();
+    Set<Operator<?>> visited = Sets.newIdentityHashSet();
+    Set<Operator<?>> seen = Sets.newIdentityHashSet();
+
+    seen.addAll(pctx.getTopOps().values());
+    while (!seen.isEmpty()) {
+      Operator<?> curr = seen.iterator().next();
+      seen.remove(curr);
+      if (visited.contains(curr)) {
+        continue;
+      }
+
+      visited.add(curr);
+
+      Cluster currentCluster = nodeCluster.get(curr);
+      if (currentCluster == null) {
+        currentCluster=new Cluster();
+        currentCluster.add(curr);
+      }
+      List<Operator<?>> parents = curr.getParentOperators();
+      for (int i = 0; i < parents.size(); i++) {
+        Operator<?> p = parents.get(i);
+        g.putEdgeValue(p, curr, new OpEdge(EdgeType.FLOW, i));
+        if (p instanceof ReduceSinkOperator) {
+          // ignore cluster of parent RS
+          continue;
+        }
+        Cluster cluster = nodeCluster.get(p);
+        if (cluster != null) {
+          currentCluster.merge(cluster);
+        } else {
+          currentCluster.add(p);
+        }
+      }
+
+      SemiJoinBranchInfo sji = pctx.getRsToSemiJoinBranchInfo().get(curr);
+      if (sji != null) {
+        g.putEdgeValue(curr, sji.getTsOp(), new OpEdge(EdgeType.SEMIJOIN));
+        seen.add(sji.getTsOp());
+      }
+      if (curr instanceof AppMasterEventOperator) {
+        DynamicPruningEventDesc dped = (DynamicPruningEventDesc) 
curr.getConf();
+        TableScanOperator ts = dped.getTableScan();
+        g.putEdgeValue(curr, ts, new OpEdge(EdgeType.DPP));
+        seen.add(ts);
+      }
+
+      List<Operator<?>> ccc = curr.getChildOperators();
+      for (Operator<?> operator : ccc) {
+        seen.add(operator);
+      }
+    }
+  }
+
+  public void toDot(File outFile) throws Exception {

Review comment:
       this is not called from anywhere right now - but I've used it during 
debug to get a better understanding of the plan




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 505853)
    Time Spent: 1.5h  (was: 1h 20m)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-24241
>                 URL: https://issues.apache.org/jira/browse/HIVE-24241
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

Reply via email to