Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2805: Order filters based on selectivity and cost ......................................................................
Patch Set 1: (22 comments) http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java: Line 789: // AnalyticExprs can never be subexpressions in a conjunct. your comments in Expr.java about cost_ don't point out that this is only computed for things that can end up in conjuncts. augment comment. http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java File fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java: Line 353: cost_ = getChildCosts() + BINARY_PREDICATE_COST; differentiate based on type: cost of 1 for fixed-len types, something higher for var-len types (and leave a todo to take average string lengths into account, which is available from column stats). http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java File fe/src/main/java/com/cloudera/impala/analysis/CaseExpr.java: Line 354: get rid of gratuitous blank lines Line 355: cost_ += getChild(i).getCost(); for both cases, you should also add in the max of the then-cost http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/Expr.java File fe/src/main/java/com/cloudera/impala/analysis/Expr.java: Line 66: public final static int ARITHMETIC_OP_COST = 1; > They were chosen arbitrarily such that they give us the general ordering we instead of pointing this out in a reply to a review comment, point this out in the comment itself ('cost values.' doesn't say much). in general, when people are asking for clarification about some piece of code, the appropriate response in most cases is to clarify in the code itself. Line 67: public final static int BETWEEN_PREDICATE_COST = 1; this is basically two binary predicates, so make it 2*binary_predicate_cost Line 68: public final static int BINARY_PREDICATE_COST = 1; a lot of these reflect the cost of an O(1) computation. might make more sense to have a general O_1_COST in here and then use that in the specific classes. Line 69: public final static int CASE_COST = 5; why? Line 79: public final static int SLOT_REF_COST = 5; that should be 1, this is just a pointer dereference. Line 80: public final static int TIMESTAMP_ARITHMETIC_COST = 1; move all of these into the respective classes timestamp arithmetic is more expensive than standard integer arithmetic. Line 184: // its subexpressions. subexpressions -> children (is the commonly used term) also, let's compute this in analyze() (and point that out in this comment). in general, analyze() fills in semantic information for each Expr class, which you need for the cost computation, because you want to differentiate based on type (for binary predicates, for instance). Line 185: protected int cost_; too generic a name; evalCost_ is more specific comment on meaning of -1 Line 225: public int getCost() { return cost_; } add checkState(isAnalyzed_) here Line 1218: public abstract void computeCost(); cost_ should be computed during analysis, so you don't need a separate function for that http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java File fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java: Line 506: cost_ = getChildCosts() + FUNCTION_CALL_COST; leave a todo to differentiate based on the specific function. we know that some are very expensive, such as anything with strings. http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/InPredicate.java File fe/src/main/java/com/cloudera/impala/analysis/InPredicate.java: Line 229: cost_ = getChildCosts() + BINARY_PREDICATE_COST * (children_.size() - 1) + IN_COST; why -1? http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/LikePredicate.java File fe/src/main/java/com/cloudera/impala/analysis/LikePredicate.java: Line 152: cost_ = getChildCosts() + LIKE_COST; leave todo: differentiate based on pattern http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/StringLiteral.java File fe/src/main/java/com/cloudera/impala/analysis/StringLiteral.java: Line 171: cost_ = value_.length(); this should also be O(1), because this is really only the cost of materializing the literal (which is a dereference). the operation applied to that literal should then differentiate based on length, if need be. http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java: Line 409: conjuncts_ = orderConjuncts(conjuncts_); do this right after assignConjuncts() (unless there's a specific reason to do it here). Line 640: if (conjuncts.size() <= 1) { single line http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/test/java/com/cloudera/impala/planner/OrderConjunctsTest.java File fe/src/test/java/com/cloudera/impala/planner/OrderConjunctsTest.java: Line 29: public class OrderConjunctsTest { instead of writing a unit test for this specific functionality, capture these test cases as planner tests. that way, you test end-to-end functionality (at least as far as planning is concerned). feel free to add a new .test file. Line 95: remove gratuitous blank lines -- To view, visit http://gerrit.cloudera.org:8080/2598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
