Alex Behm has posted comments on this change.

Change subject: IMPALA-2805: Order conjuncts based on selectivity and cost
......................................................................


Patch Set 13:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java:

Line 328:     // Compute costas the sum of evaluating all of the WHEN exprs, 
plus
typo: costas


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/CastExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/CastExpr.java:

Line 205:     if (getChild(0).hasCost()) evalCost_ = getChild(0).getCost() + 
CAST_COST;
can't this go in analyze() below? then you can remove the extra cost 
computation in the "pre-analyzed" c'tor


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/ExistsPredicate.java
File fe/src/main/java/com/cloudera/impala/analysis/ExistsPredicate.java:

Line 60:     if (getChild(0).hasCost()) evalCost_ = getChild(0).getCost() + 
EXISTS_COST;
remove


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/analysis/Expr.java
File fe/src/main/java/com/cloudera/impala/analysis/Expr.java:

Line 76:   public final static float EXISTS_COST = 1;
Remove because EXISTS is rewritten into a join or rejected.


Line 187:   // its children. Set during analysis. Set to -1 if the cost could 
not be estimated.
Are we really setting the cost to -1 in cases where the cost could not be 
estimated? In PlanNode.orderConjunctsByCost() we have a Preconditions check 
that asserts all conjuncts need to have a cost.

Seems like we should either treat those with -1 specially when ordering, or 
this comment is not accurate and should be fixed.


Line 1236:   protected int getChildCosts() {
Is cost supposed to be an int or a float? You sometimes use int and sometimes 
float (e.g., in the evalCost_ member you use float but here you use int)

Sticking to int seems fine to me, unless there is a reason not to.


Line 1238:     for (Expr child : children_) {
single line if it fits


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/com/cloudera/impala/planner/AnalyticEvalNode.java:

Line 121:     conjuncts_ = orderConjunctsByCost(conjuncts_);
replace this with a Preconditions.checkState(conjuncts_.isEmpty()); because we 
should never assign conjuncts to an AnalyticEvalNode


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java
File fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java:

Line 102:     eqJoinConjuncts_ = orderConjunctsByCost(eqJoinConjuncts_);
save a line and just pass in newEqJoinConjuncts here


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java
File fe/src/main/java/com/cloudera/impala/planner/NestedLoopJoinNode.java:

Line 70:     eqJoinConjuncts_ = orderConjunctsByCost(eqJoinConjuncts_);
also order conjuncts_

You might want to add a helper function to JoinNode that orders 
eqJoinConjuncts_, otherJoinConjuncts_, and conjuncts_


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java
File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java:

Line 406:     conjuncts_ = orderConjunctsByCost(conjuncts_);
Does this lead to duplicate ordering since we already do the ordering in the 
individual plan node's init?

We might as well have all PlanNodes invoke themselves. It seems cleaner to me 
to separate the conjunct assignment from the ordering, since many plan nodes 
have some custom logic when/what to order.


Line 638:    * As in computeSelecivity, the selectivities are exponentially 
backed off over the
computeSelectivity -> computeCombinedSelectivity()


Line 646:     List<T> remaining = new ArrayList<T>();
Lists.newArrayListWithCapacity(conjuncts.size());


Line 653:     List<T> sortedConjuncts = new ArrayList<T>();
Lists.newArrayListWithCapacity(conjuncts.size());


Line 654:     while (remaining.size() > 0) {
while (!remaining.isEmpty())


Line 656:       T optConjunct =  null;
bestConjunct?

optConjunct could be misleading (opt == optional?)


Line 657:       double backoff_exp = 1.0 / (double) (sortedConjuncts.size() + 
1);
backoffExponent

Why is the backoff even needed? We're trying to get a relative ordering of the 
conjuncts and the same factor is applied to the selectivity of all remaining 
candidates below, so could we not just omit this backoff? It doesn't seem like 
the backoff would change the ordering.


Line 667:         // applying this conjunct to all tuples plus the cost of 
applying all the
tuples -> rows

(and elsewhere)


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/SortNode.java
File fe/src/main/java/com/cloudera/impala/planner/SortNode.java:

Line 120:     conjuncts_ = orderConjunctsByCost(conjuncts_);
We should never assign conjuncts to this node, so you can add a 
Preconditions.checkState() after assignConjuncts() that the conjuncts_ are 
empty. Lmk if I'm missing something.


http://gerrit.cloudera.org:8080/#/c/2598/13/fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java
File fe/src/main/java/com/cloudera/impala/planner/SubplanNode.java:

Line 67:     conjuncts_ = orderConjunctsByCost(conjuncts_);
please move as close as possible to where the conjuncts as assigned (and 
elsewhere)


http://gerrit.cloudera.org:8080/#/c/2598/13/testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test:

Line 2: select *
I think we need more tests here to cover the different plan nodes and the 
different types of conjuncts. Lmk if you need help coming up with queries to 
cover the different cases.


Line 11: select *
For each test add a brief comment what it is covering.


-- 
To view, visit http://gerrit.cloudera.org:8080/2598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to