Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-2805: Order filters based on selectivity and cost
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/analysis/Expr.java
File fe/src/main/java/com/cloudera/impala/analysis/Expr.java:

Line 66:   public final static int ARITHMETIC_OP_COST = 1;
> How did you come up with these constants?
They were chosen arbitrarily such that they give us the general ordering we're 
going for.

Adding a more principled way of determining these numbers, eg. by running a 
benchmark, is a good next step for this work.


http://gerrit.cloudera.org:8080/#/c/2598/1/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java
File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java:

Line 667:         double cost = e.getCost() + (totalCost - e.getCost()) * 
e.getSelectivity();
> Why is data type not taken into account?
Data type is taken into account when the costs are computed, since these 
conjuncts may have complex subexpressions with different types (eg. it may be a 
compound predicate with a string comparison and an integer comparison).

Currently, the way that works is that literals have a cost that reflects their 
complexity (and I realize as I'm typing this that the same should be true of 
slot refs, so I'll make that change if people are happy with this direction), 
eg. numeric literals have constant cost while string literals have cost equal 
to their length.

This makes the calculation of the costs much simpler, since you don't need 
special cases in every Expr for every possible type of its subexpressions, but 
its also wrong sometimes, eg. you might have a function call that takes a 
string as a parameter but that isn't actually linear in the length of the 
string.

However, our goal here was not so much to get very precise estimates of the 
costs as to differentiate between obviously more or less expensive operations 
(eg. string operations are usually more expensive than numeric operations) in a 
way where its relatively easy to look at a query plan and understand where the 
ordering came from.


-- 
To view, visit http://gerrit.cloudera.org:8080/2598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to