Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-2805: Order conjuncts based on selectivity and cost ......................................................................
Patch Set 13: (20 comments) Updated to the latest review and discussion. Currently running perf tests and I'll post the results when I get them. Still need to write more functional tests, but I'd rather wait to be sure we are happy with our cost model before doing that. http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java File fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java: Line 328: // Compute costas the sum of evaluating all of the WHEN exprs, plus > typo: costas Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/CastExpr.java File fe/src/main/java/com/cloudera/impala/analysis/CastExpr.java: Line 205: if (getChild(0).hasCost()) evalCost_ = getChild(0).getCost() + CAST_COST; > can't this go in analyze() below? then you can remove the extra cost comput Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/ExistsPredicate.java File fe/src/main/java/com/cloudera/impala/analysis/ExistsPredicate.java: Line 60: if (getChild(0).hasCost()) evalCost_ = getChild(0).getCost() + EXISTS_COST; > remove Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/Expr.java File fe/src/main/java/com/cloudera/impala/analysis/Expr.java: Line 76: public final static float EXISTS_COST = 1; > Remove because EXISTS is rewritten into a join or rejected. Done Line 187: // its children. Set during analysis. Set to -1 if the cost could not be estimated. > Are we really setting the cost to -1 in cases where the cost could not be e I updated the comment to (hopefully) clarify this. Line 1236: protected int getChildCosts() { > Is cost supposed to be an int or a float? You sometimes use int and sometim I had originally used int, but Marcel pointed out that float would leave us more flexibility in the future, eg. if we start basing cost values off of a benchmark. I just missed this spot when refactoring everything to use floats. Line 1238: for (Expr child : children_) { > single line if it fits Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java File fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java: Line 121: conjuncts_ = orderConjunctsByCost(conjuncts_); > replace this with a Preconditions.checkState(conjuncts_.isEmpty()); because Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java File fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java: Line 102: eqJoinConjuncts_ = orderConjunctsByCost(eqJoinConjuncts_); > save a line and just pass in newEqJoinConjuncts here Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java File fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java: Line 70: eqJoinConjuncts_ = orderConjunctsByCost(eqJoinConjuncts_); > also order conjuncts_ Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java: Line 406: conjuncts_ = orderConjunctsByCost(conjuncts_); > Does this lead to duplicate ordering since we already do the ordering in th Most of the PlanNodes don't actually end up calling PlanNode::init, and for those of them that do call PlanNode::init, I don't have a call to orderConjuncts in their own init. So, I'm fairly sure that this doesn't result in any duplicate ordering. However, it seems a little messy to rely on this behavior, so I'll remove this and do all of the sorting in the individual node's init's. Line 638: * As in computeSelecivity, the selectivities are exponentially backed off over the > computeSelectivity -> computeCombinedSelectivity() Done Line 646: List<T> remaining = new ArrayList<T>(); > Lists.newArrayListWithCapacity(conjuncts.size()); Done Line 653: List<T> sortedConjuncts = new ArrayList<T>(); > Lists.newArrayListWithCapacity(conjuncts.size()); Done Line 654: while (remaining.size() > 0) { > while (!remaining.isEmpty()) Done Line 656: T optConjunct = null; > bestConjunct? Done Line 663: sel = Math.pow(Expr.DEFAULT_SELECTIVITY, backoff_exp); > actually, a decent first approach might be to distribute the default select Agreed that DEFAULT_SELECTIVITY is pretty high to be applied to each of the Exprs without estimates. Distributing DEFAULT_SELECTIVITY as suggested seems to make sense, so I implemented that. Line 667: // applying this conjunct to all tuples plus the cost of applying all the > tuples -> rows Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/SortNode.java File fe/src/main/java/com/cloudera/impala/planner/SortNode.java: Line 120: conjuncts_ = orderConjunctsByCost(conjuncts_); > We should never assign conjuncts to this node, so you can add a Preconditio Done http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java File fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java: Line 67: conjuncts_ = orderConjunctsByCost(conjuncts_); > please move as close as possible to where the conjuncts as assigned (and el Sure, though I actually deleted the call to orderConjuncts for this node based on the 'Preconditions.checkState(conjuncts_.isEmpty()' above. -- To view, visit http://gerrit.cloudera.org:8080/2598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034 Gerrit-PatchSet: 13 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
