Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-2805: Order filters based on selectivity and cost ......................................................................
Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/Expr.java File fe/src/main/java/com/cloudera/impala/analysis/Expr.java: Line 66: public final static int ARITHMETIC_OP_COST = 1; > How did you come up with these constants? They were chosen arbitrarily such that they give us the general ordering we're going for. Adding a more principled way of determining these numbers, eg. by running a benchmark, is a good next step for this work. http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java: Line 667: double cost = e.getCost() + (totalCost - e.getCost()) * e.getSelectivity(); > Why is data type not taken into account? Data type is taken into account when the costs are computed, since these conjuncts may have complex subexpressions with different types (eg. it may be a compound predicate with a string comparison and an integer comparison). Currently, the way that works is that literals have a cost that reflects their complexity (and I realize as I'm typing this that the same should be true of slot refs, so I'll make that change if people are happy with this direction), eg. numeric literals have constant cost while string literals have cost equal to their length. This makes the calculation of the costs much simpler, since you don't need special cases in every Expr for every possible type of its subexpressions, but its also wrong sometimes, eg. you might have a function call that takes a string as a parameter but that isn't actually linear in the length of the string. However, our goal here was not so much to get very precise estimates of the costs as to differentiate between obviously more or less expensive operations (eg. string operations are usually more expensive than numeric operations) in a way where its relatively easy to look at a query plan and understand where the ordering came from. -- To view, visit http://gerrit.cloudera.org:8080/2598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
