Alex Behm has posted comments on this change. Change subject: IMPALA-2805: Order conjuncts based on selectivity and cost ......................................................................
Patch Set 13: (22 comments) http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java File fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java: Line 328: // Compute costas the sum of evaluating all of the WHEN exprs, plus typo: costas http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/CastExpr.java File fe/src/main/java/com/cloudera/impala/analysis/CastExpr.java: Line 205: if (getChild(0).hasCost()) evalCost_ = getChild(0).getCost() + CAST_COST; can't this go in analyze() below? then you can remove the extra cost computation in the "pre-analyzed" c'tor http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/ExistsPredicate.java File fe/src/main/java/com/cloudera/impala/analysis/ExistsPredicate.java: Line 60: if (getChild(0).hasCost()) evalCost_ = getChild(0).getCost() + EXISTS_COST; remove http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/Expr.java File fe/src/main/java/com/cloudera/impala/analysis/Expr.java: Line 76: public final static float EXISTS_COST = 1; Remove because EXISTS is rewritten into a join or rejected. Line 187: // its children. Set during analysis. Set to -1 if the cost could not be estimated. Are we really setting the cost to -1 in cases where the cost could not be estimated? In PlanNode.orderConjunctsByCost() we have a Preconditions check that asserts all conjuncts need to have a cost. Seems like we should either treat those with -1 specially when ordering, or this comment is not accurate and should be fixed. Line 1236: protected int getChildCosts() { Is cost supposed to be an int or a float? You sometimes use int and sometimes float (e.g., in the evalCost_ member you use float but here you use int) Sticking to int seems fine to me, unless there is a reason not to. Line 1238: for (Expr child : children_) { single line if it fits http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java File fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java: Line 121: conjuncts_ = orderConjunctsByCost(conjuncts_); replace this with a Preconditions.checkState(conjuncts_.isEmpty()); because we should never assign conjuncts to an AnalyticEvalNode http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java File fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java: Line 102: eqJoinConjuncts_ = orderConjunctsByCost(eqJoinConjuncts_); save a line and just pass in newEqJoinConjuncts here http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java File fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java: Line 70: eqJoinConjuncts_ = orderConjunctsByCost(eqJoinConjuncts_); also order conjuncts_ You might want to add a helper function to JoinNode that orders eqJoinConjuncts_, otherJoinConjuncts_, and conjuncts_ http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java: Line 406: conjuncts_ = orderConjunctsByCost(conjuncts_); Does this lead to duplicate ordering since we already do the ordering in the individual plan node's init? We might as well have all PlanNodes invoke themselves. It seems cleaner to me to separate the conjunct assignment from the ordering, since many plan nodes have some custom logic when/what to order. Line 638: * As in computeSelecivity, the selectivities are exponentially backed off over the computeSelectivity -> computeCombinedSelectivity() Line 646: List<T> remaining = new ArrayList<T>(); Lists.newArrayListWithCapacity(conjuncts.size()); Line 653: List<T> sortedConjuncts = new ArrayList<T>(); Lists.newArrayListWithCapacity(conjuncts.size()); Line 654: while (remaining.size() > 0) { while (!remaining.isEmpty()) Line 656: T optConjunct = null; bestConjunct? optConjunct could be misleading (opt == optional?) Line 657: double backoff_exp = 1.0 / (double) (sortedConjuncts.size() + 1); backoffExponent Why is the backoff even needed? We're trying to get a relative ordering of the conjuncts and the same factor is applied to the selectivity of all remaining candidates below, so could we not just omit this backoff? It doesn't seem like the backoff would change the ordering. Line 667: // applying this conjunct to all tuples plus the cost of applying all the tuples -> rows (and elsewhere) http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/SortNode.java File fe/src/main/java/com/cloudera/impala/planner/SortNode.java: Line 120: conjuncts_ = orderConjunctsByCost(conjuncts_); We should never assign conjuncts to this node, so you can add a Preconditions.checkState() after assignConjuncts() that the conjuncts_ are empty. Lmk if I'm missing something. http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java File fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java: Line 67: conjuncts_ = orderConjunctsByCost(conjuncts_); please move as close as possible to where the conjuncts as assigned (and elsewhere) http://gerrit.cloudera.org:8080/#/c/2598/13/testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test File testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test: Line 2: select * I think we need more tests here to cover the different plan nodes and the different types of conjuncts. Lmk if you need help coming up with queries to cover the different cases. Line 11: select * For each test add a brief comment what it is covering. -- To view, visit http://gerrit.cloudera.org:8080/2598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034 Gerrit-PatchSet: 13 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
