Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-2805: Order filters based on selectivity and cost
......................................................................


Patch Set 1:

For now, we're looking at the performance testing from two angles:

- Updates to the ordering of filters in the existing PlannerTests.

For many of these test file updates, you can manually inspect them and see that 
the ordering change makes sense (eg. putting integer comparisons before string 
comparisons).

For some, the reason for the reordering is not so obvious, usually because some 
predicates that seem like they should come first are missing selectivity 
estimates and so end up at the end. I generated a diff of all of the changes to 
the .test files that is annotated with cost and selectivity values to make this 
easier to see: 
https://drive.google.com/a/cloudera.com/file/d/0B_wAG2vSkAGyUHEwWXBnNnZLdWs/view?usp=sharing

- Manually run tests.

A realistic query:
select 
    count(*)
from
    tpch.lineitem
where
         l_comment like '%a%b%s%' and l_orderkey = 19;
which took ~2.06s on cdh5-trunk and ~1.65s with this change

And a contrived query that shows the effect well:
select
    *
from
    functional.alltypesagg
where
    repeat(string_col, 1000) like repeat(string_col, 1000) and id = -1;
which took ~3.14s on cdh5-trunk and ~0.22s with these changes.

-- 
To view, visit http://gerrit.cloudera.org:8080/2598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: No

Reply via email to