Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-2805: Order filters based on selectivity and cost ......................................................................
Patch Set 1: For now, we're looking at the performance testing from two angles: - Updates to the ordering of filters in the existing PlannerTests. For many of these test file updates, you can manually inspect them and see that the ordering change makes sense (eg. putting integer comparisons before string comparisons). For some, the reason for the reordering is not so obvious, usually because some predicates that seem like they should come first are missing selectivity estimates and so end up at the end. I generated a diff of all of the changes to the .test files that is annotated with cost and selectivity values to make this easier to see: https://drive.google.com/a/cloudera.com/file/d/0B_wAG2vSkAGyUHEwWXBnNnZLdWs/view?usp=sharing - Manually run tests. A realistic query: select count(*) from tpch.lineitem where l_comment like '%a%b%s%' and l_orderkey = 19; which took ~2.06s on cdh5-trunk and ~1.65s with this change And a contrived query that shows the effect well: select * from functional.alltypesagg where repeat(string_col, 1000) like repeat(string_col, 1000) and id = -1; which took ~3.14s on cdh5-trunk and ~0.22s with these changes. -- To view, visit http://gerrit.cloudera.org:8080/2598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: No
