Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-2805: Order conjuncts based on selectivity and cost
......................................................................


Patch Set 13:

(20 comments)

Updated to the latest review and discussion. Currently running perf tests and 
I'll post the results when I get them. Still need to write more functional 
tests, but I'd rather wait to be sure we are happy with our cost model before 
doing that.

http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java:

Line 328:     // Compute costas the sum of evaluating all of the WHEN exprs, 
plus
> typo: costas
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/CastExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/CastExpr.java:

Line 205:     if (getChild(0).hasCost()) evalCost_ = getChild(0).getCost() + 
CAST_COST;
> can't this go in analyze() below? then you can remove the extra cost comput
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/ExistsPredicate.java
File fe/src/main/java/com/cloudera/impala/analysis/ExistsPredicate.java:

Line 60:     if (getChild(0).hasCost()) evalCost_ = getChild(0).getCost() + 
EXISTS_COST;
> remove
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/Expr.java
File fe/src/main/java/com/cloudera/impala/analysis/Expr.java:

Line 76:   public final static float EXISTS_COST = 1;
> Remove because EXISTS is rewritten into a join or rejected.
Done


Line 187:   // its children. Set during analysis. Set to -1 if the cost could 
not be estimated.
> Are we really setting the cost to -1 in cases where the cost could not be e
I updated the comment to  (hopefully) clarify this.


Line 1236:   protected int getChildCosts() {
> Is cost supposed to be an int or a float? You sometimes use int and sometim
I had originally used int, but Marcel pointed out that float would leave us 
more flexibility in the future, eg. if we start basing cost values off of a 
benchmark. I just missed this spot when refactoring everything to use floats.


Line 1238:     for (Expr child : children_) {
> single line if it fits
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java:

Line 121:     conjuncts_ = orderConjunctsByCost(conjuncts_);
> replace this with a Preconditions.checkState(conjuncts_.isEmpty()); because
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java
File fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java:

Line 102:     eqJoinConjuncts_ = orderConjunctsByCost(eqJoinConjuncts_);
> save a line and just pass in newEqJoinConjuncts here
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java
File fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java:

Line 70:     eqJoinConjuncts_ = orderConjunctsByCost(eqJoinConjuncts_);
> also order conjuncts_
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java
File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java:

Line 406:     conjuncts_ = orderConjunctsByCost(conjuncts_);
> Does this lead to duplicate ordering since we already do the ordering in th
Most of the PlanNodes don't actually end up calling PlanNode::init, and for 
those of them that do call PlanNode::init, I don't have a call to 
orderConjuncts in their own init.

So, I'm fairly sure that this doesn't result in any duplicate ordering. 
However, it seems a little messy to rely on this behavior, so I'll remove this 
and do all of the sorting in the individual node's init's.


Line 638:    * As in computeSelecivity, the selectivities are exponentially 
backed off over the
> computeSelectivity -> computeCombinedSelectivity()
Done


Line 646:     List<T> remaining = new ArrayList<T>();
> Lists.newArrayListWithCapacity(conjuncts.size());
Done


Line 653:     List<T> sortedConjuncts = new ArrayList<T>();
> Lists.newArrayListWithCapacity(conjuncts.size());
Done


Line 654:     while (remaining.size() > 0) {
> while (!remaining.isEmpty())
Done


Line 656:       T optConjunct =  null;
> bestConjunct?
Done


Line 663:           sel = Math.pow(Expr.DEFAULT_SELECTIVITY, backoff_exp);
> actually, a decent first approach might be to distribute the default select
Agreed that DEFAULT_SELECTIVITY is pretty high to be applied to each of the 
Exprs without estimates.

Distributing DEFAULT_SELECTIVITY as suggested seems to make sense, so I 
implemented that.


Line 667:         // applying this conjunct to all tuples plus the cost of 
applying all the
> tuples -> rows
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/SortNode.java
File fe/src/main/java/com/cloudera/impala/planner/SortNode.java:

Line 120:     conjuncts_ = orderConjunctsByCost(conjuncts_);
> We should never assign conjuncts to this node, so you can add a Preconditio
Done


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java
File fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java:

Line 67:     conjuncts_ = orderConjunctsByCost(conjuncts_);
> please move as close as possible to where the conjuncts as assigned (and el
Sure, though I actually deleted the call to orderConjuncts for this node based 
on the 'Preconditions.checkState(conjuncts_.isEmpty()' above.


-- 
To view, visit http://gerrit.cloudera.org:8080/2598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to