Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2805: Order filters based on selectivity and cost
......................................................................


Patch Set 1:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java:

Line 789:     // AnalyticExprs can never be subexpressions in a conjunct.
your comments in Expr.java about cost_ don't point out that this is only 
computed for things that can end up in conjuncts.  augment comment.


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java
File fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java:

Line 353:     cost_ = getChildCosts() + BINARY_PREDICATE_COST;
differentiate based on type: cost of 1 for fixed-len types, something higher 
for var-len types (and leave a todo to take average string lengths into 
account, which is available from column stats).


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java:

Line 354: 
get rid of gratuitous blank lines


Line 355:         cost_ += getChild(i).getCost();
for both cases, you should also add in the max of the then-cost


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/Expr.java
File fe/src/main/java/com/cloudera/impala/analysis/Expr.java:

Line 66:   public final static int ARITHMETIC_OP_COST = 1;
> They were chosen arbitrarily such that they give us the general ordering we
instead of pointing this out in a reply to a review comment, point this out in 
the comment itself ('cost values.' doesn't say much).

in general, when people are asking for clarification about some piece of code, 
the appropriate response in most cases is to clarify in the code itself.


Line 67:   public final static int BETWEEN_PREDICATE_COST = 1;
this is basically two binary predicates, so make it 2*binary_predicate_cost


Line 68:   public final static int BINARY_PREDICATE_COST = 1;
a lot of these reflect the cost of an O(1) computation. might make more sense 
to have a general O_1_COST in here and then use that in the specific classes.


Line 69:   public final static int CASE_COST = 5;
why?


Line 79:   public final static int SLOT_REF_COST = 5;
that should be 1, this is just a pointer dereference.


Line 80:   public final static int TIMESTAMP_ARITHMETIC_COST = 1;
move all of these into the respective classes

timestamp arithmetic is more expensive than standard integer arithmetic.


Line 184:   // its subexpressions.
subexpressions -> children (is the commonly used term)

also, let's compute this in analyze() (and point that out in this comment).

in general, analyze() fills in semantic information for each Expr class, which 
you need for the cost computation, because you want to differentiate based on 
type (for binary predicates, for instance).


Line 185:   protected int cost_;
too generic a name; evalCost_ is more specific

comment on meaning of -1


Line 225:   public int getCost() { return cost_; }
add checkState(isAnalyzed_) here


Line 1218:   public abstract void computeCost();
cost_ should be computed during analysis, so you don't need a separate function 
for that


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java:

Line 506:     cost_ = getChildCosts() + FUNCTION_CALL_COST;
leave a todo to differentiate based on the specific function. we know that some 
are very expensive, such as anything with strings.


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/InPredicate.java
File fe/src/main/java/com/cloudera/impala/analysis/InPredicate.java:

Line 229:     cost_ = getChildCosts() + BINARY_PREDICATE_COST * 
(children_.size() - 1) + IN_COST;
why -1?


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/LikePredicate.java
File fe/src/main/java/com/cloudera/impala/analysis/LikePredicate.java:

Line 152:     cost_ = getChildCosts() + LIKE_COST;
leave todo: differentiate based on pattern


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/StringLiteral.java
File fe/src/main/java/com/cloudera/impala/analysis/StringLiteral.java:

Line 171:     cost_ = value_.length();
this should also be O(1), because this is really only the cost of materializing 
the literal (which is a dereference). the operation applied to that literal 
should then differentiate based on length, if need be.


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java
File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java:

Line 409:     conjuncts_ = orderConjuncts(conjuncts_);
do this right after assignConjuncts() (unless there's a specific reason to do 
it here).


Line 640:     if (conjuncts.size() <= 1) {
single line


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/test/java/com/cloudera/impala/planner/OrderConjunctsTest.java
File fe/src/test/java/com/cloudera/impala/planner/OrderConjunctsTest.java:

Line 29: public class OrderConjunctsTest {
instead of writing a unit test for this specific functionality, capture these 
test cases as planner tests. that way, you test end-to-end functionality (at 
least as far as planning is concerned). feel free to add a new .test file.


Line 95: 
remove gratuitous blank lines


-- 
To view, visit http://gerrit.cloudera.org:8080/2598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to