[ https://issues.apache.org/jira/browse/IMPALA-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Rahn updated IMPALA-8165: ------------------------------ Labels: planner tpc-ds (was: planner) > Planner does not push through predicates when there is a disjunction > -------------------------------------------------------------------- > > Key: IMPALA-8165 > URL: https://issues.apache.org/jira/browse/IMPALA-8165 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Reporter: Greg Rahn > Priority: Major > Labels: planner, tpc-ds > > If we take a simple query like such: > {noformat} > select avg(ss_quantity) > from store_sales > join household_demographics on (ss_hdemo_sk=hd_demo_sk) > where (ss_sales_price between 0 and 100 and hd_dep_count = 1) > or (ss_sales_price between 100 and 200 and hd_dep_count = 2); > {noformat} > and look at the plan we see that neither table scan has any predicates pushed > to it, the only filter is in the join > (from impalad version 2.12.0-cdh5.16.x RELEASE (build > 3f68649c7bf8a01fb6ba0cbe35dd2492adb836dd) > {noformat} > PLAN-ROOT SINK > | > 06:AGGREGATE [FINALIZE] > | output: avg:merge(ss_quantity) > | > 05:EXCHANGE [UNPARTITIONED] > | > 03:AGGREGATE > | output: avg(ss_quantity) > | > 02:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: ss_hdemo_sk = hd_demo_sk > | other predicates: (ss_sales_price >= 0 AND ss_sales_price <= 100 AND > hd_dep_count = 1) OR (ss_sales_price >= 100 AND ss_sales_price <= 200 AND > hd_dep_count = 2) > | runtime filters: RF000 <- hd_demo_sk > | > |--04:EXCHANGE [BROADCAST] > | | > | 01:SCAN HDFS [tpcds_1000_parquet.household_demographics] > | partitions=1/1 files=1 size=41.08KB > | > 00:SCAN HDFS [tpcds_1000_parquet.store_sales] > partitions=1824/1824 files=1824 size=189.24GB > runtime filters: RF000 -> ss_hdemo_sk > {noformat} > If we look at PostgreSQL 11.1, we'll see that not only does the join filter, > but the table scan has the appropriate filters pushed to it. > {noformat} > Finalize Aggregate (cost=67549.69..67549.70 rows=1 width=32) > -> Gather (cost=67549.47..67549.68 rows=2 width=32) > Workers Planned: 2 > -> Partial Aggregate (cost=66549.47..66549.48 rows=1 width=32) > -> Hash Join (cost=113.12..66547.64 rows=734 width=4) > Hash Cond: (store_sales.ss_hdemo_sk = > household_demographics.hd_demo_sk) > Join Filter: (((store_sales.ss_sales_price >= > '0'::numeric) AND > (store_sales.ss_sales_price <= > '100'::numeric) AND > (household_demographics.hd_dep_count = > 1)) OR > ((store_sales.ss_sales_price >= > '100'::numeric) AND > (store_sales.ss_sales_price <= > '200'::numeric) AND > (household_demographics.hd_dep_count = > 2))) > -> Parallel Seq Scan on store_sales > (cost=0.00..66343.20 rows=7305 width=22) > Filter: (((ss_sales_price >= '0'::numeric) AND > (ss_sales_price <= '100'::numeric)) OR > ((ss_sales_price >= '100'::numeric) AND > (ss_sales_price <= '200'::numeric))) > -> Hash (cost=112.62..112.62 rows=40 width=8) > -> Seq Scan on household_demographics > (cost=0.00..112.62 rows=40 width=8) > Filter: ((hd_dep_count = 1) OR (hd_dep_count > = 2)) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org