[jira] [Updated] (IMPALA-8165) Planner does not push through predicates when there is a disjunction

Greg Rahn (JIRA) Tue, 05 Feb 2019 17:04:30 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Greg Rahn updated IMPALA-8165:
------------------------------
    Description: 
If we take a simple query like such:

{noformat}
select avg(ss_quantity)
from store_sales 
join household_demographics on (ss_hdemo_sk=hd_demo_sk)
where (ss_sales_price between 0 and 100 and hd_dep_count = 1)
 or (ss_sales_price between 100 and 200 and hd_dep_count = 2);
{noformat}

and look at the plan we see that neither table scan has any predicates pushed 
to it, the only filter is in the join

(from impalad version 2.12.0-cdh5.16.x RELEASE (build 
3f68649c7bf8a01fb6ba0cbe35dd2492adb836dd)
{noformat}
PLAN-ROOT SINK
|
06:AGGREGATE [FINALIZE]
| output: avg:merge(ss_quantity)
|
05:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE
| output: avg(ss_quantity)
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: ss_hdemo_sk = hd_demo_sk
| other predicates: (ss_sales_price >= 0 AND ss_sales_price <= 100 AND 
hd_dep_count = 1) OR 
|                             (ss_sales_price >= 100 AND ss_sales_price <= 200 
AND hd_dep_count = 2)
| runtime filters: RF000 <- hd_demo_sk
|
|--04:EXCHANGE [BROADCAST]
| |
| 01:SCAN HDFS [tpcds_1000_parquet.household_demographics]
| partitions=1/1 files=1 size=41.08KB
|
00:SCAN HDFS [tpcds_1000_parquet.store_sales]
 partitions=1824/1824 files=1824 size=189.24GB
 runtime filters: RF000 -> ss_hdemo_sk
{noformat}

If we look at PostgreSQL 11.1, we'll see that not only does the join filter, 
but the table scan has the appropriate filters pushed to it.

{noformat}
 Finalize Aggregate  (cost=67549.69..67549.70 rows=1 width=32)
   ->  Gather  (cost=67549.47..67549.68 rows=2 width=32)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=66549.47..66549.48 rows=1 width=32)
               ->  Hash Join  (cost=113.12..66547.64 rows=734 width=4)
                     Hash Cond: (store_sales.ss_hdemo_sk = 
household_demographics.hd_demo_sk)
                     Join Filter: (((store_sales.ss_sales_price >= 
'0'::numeric) AND 
                                    (store_sales.ss_sales_price <= 
'100'::numeric) AND 
                                    (household_demographics.hd_dep_count = 1)) 
OR 
                                   ((store_sales.ss_sales_price >= 
'100'::numeric) AND 
                                    (store_sales.ss_sales_price <= 
'200'::numeric) AND 
                                    (household_demographics.hd_dep_count = 2)))
                     ->  Parallel Seq Scan on store_sales  (cost=0.00..66343.20 
rows=7305 width=22)
                           Filter: (((ss_sales_price >= '0'::numeric) AND 
(ss_sales_price <= '100'::numeric)) OR 
                                    ((ss_sales_price >= '100'::numeric) AND 
(ss_sales_price <= '200'::numeric)))
                     ->  Hash  (cost=112.62..112.62 rows=40 width=8)
                           ->  Seq Scan on household_demographics  
(cost=0.00..112.62 rows=40 width=8)
                                 Filter: ((hd_dep_count = 1) OR (hd_dep_count = 
2))
{noformat}



  was:
If we take a simple query like such:

{noformat}
select avg(ss_quantity)
from store_sales 
join household_demographics on (ss_hdemo_sk=hd_demo_sk)
where (ss_sales_price between 0 and 100 and hd_dep_count = 1)
 or (ss_sales_price between 100 and 200 and hd_dep_count = 2);
{noformat}

and look at the plan we see that neither table scan has any predicates pushed 
to it, the only filter is in the join

(from impalad version 2.12.0-cdh5.16.x RELEASE (build 
3f68649c7bf8a01fb6ba0cbe35dd2492adb836dd)
{noformat}
PLAN-ROOT SINK
|
06:AGGREGATE [FINALIZE]
| output: avg:merge(ss_quantity)
|
05:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE
| output: avg(ss_quantity)
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: ss_hdemo_sk = hd_demo_sk
| other predicates: (ss_sales_price >= 0 AND ss_sales_price <= 100 AND 
hd_dep_count = 1) OR (ss_sales_price >= 100 AND ss_sales_price <= 200 AND 
hd_dep_count = 2)
| runtime filters: RF000 <- hd_demo_sk
|
|--04:EXCHANGE [BROADCAST]
| |
| 01:SCAN HDFS [tpcds_1000_parquet.household_demographics]
| partitions=1/1 files=1 size=41.08KB
|
00:SCAN HDFS [tpcds_1000_parquet.store_sales]
 partitions=1824/1824 files=1824 size=189.24GB
 runtime filters: RF000 -> ss_hdemo_sk
{noformat}

If we look at PostgreSQL 11.1, we'll see that not only does the join filter, 
but the table scan has the appropriate filters pushed to it.

{noformat}
 Finalize Aggregate  (cost=67549.69..67549.70 rows=1 width=32)
   ->  Gather  (cost=67549.47..67549.68 rows=2 width=32)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=66549.47..66549.48 rows=1 width=32)
               ->  Hash Join  (cost=113.12..66547.64 rows=734 width=4)
                     Hash Cond: (store_sales.ss_hdemo_sk = 
household_demographics.hd_demo_sk)
                     Join Filter: (((store_sales.ss_sales_price >= 
'0'::numeric) AND 
                                    (store_sales.ss_sales_price <= 
'100'::numeric) AND 
                                    (household_demographics.hd_dep_count = 1)) 
OR 
                                   ((store_sales.ss_sales_price >= 
'100'::numeric) AND 
                                    (store_sales.ss_sales_price <= 
'200'::numeric) AND 
                                    (household_demographics.hd_dep_count = 2)))
                     ->  Parallel Seq Scan on store_sales  (cost=0.00..66343.20 
rows=7305 width=22)
                           Filter: (((ss_sales_price >= '0'::numeric) AND 
(ss_sales_price <= '100'::numeric)) OR 
                                    ((ss_sales_price >= '100'::numeric) AND 
(ss_sales_price <= '200'::numeric)))
                     ->  Hash  (cost=112.62..112.62 rows=40 width=8)
                           ->  Seq Scan on household_demographics  
(cost=0.00..112.62 rows=40 width=8)
                                 Filter: ((hd_dep_count = 1) OR (hd_dep_count = 
2))
{noformat}




> Planner does not push through predicates when there is a disjunction
> --------------------------------------------------------------------
>
>                 Key: IMPALA-8165
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8165
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Greg Rahn
>            Priority: Major
>              Labels: planner, tpc-ds
>
> If we take a simple query like such:
> {noformat}
> select avg(ss_quantity)
> from store_sales 
> join household_demographics on (ss_hdemo_sk=hd_demo_sk)
> where (ss_sales_price between 0 and 100 and hd_dep_count = 1)
>  or (ss_sales_price between 100 and 200 and hd_dep_count = 2);
> {noformat}
> and look at the plan we see that neither table scan has any predicates pushed 
> to it, the only filter is in the join
> (from impalad version 2.12.0-cdh5.16.x RELEASE (build 
> 3f68649c7bf8a01fb6ba0cbe35dd2492adb836dd)
> {noformat}
> PLAN-ROOT SINK
> |
> 06:AGGREGATE [FINALIZE]
> | output: avg:merge(ss_quantity)
> |
> 05:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE
> | output: avg(ss_quantity)
> |
> 02:HASH JOIN [INNER JOIN, BROADCAST]
> | hash predicates: ss_hdemo_sk = hd_demo_sk
> | other predicates: (ss_sales_price >= 0 AND ss_sales_price <= 100 AND 
> hd_dep_count = 1) OR 
> |                             (ss_sales_price >= 100 AND ss_sales_price <= 
> 200 AND hd_dep_count = 2)
> | runtime filters: RF000 <- hd_demo_sk
> |
> |--04:EXCHANGE [BROADCAST]
> | |
> | 01:SCAN HDFS [tpcds_1000_parquet.household_demographics]
> | partitions=1/1 files=1 size=41.08KB
> |
> 00:SCAN HDFS [tpcds_1000_parquet.store_sales]
>  partitions=1824/1824 files=1824 size=189.24GB
>  runtime filters: RF000 -> ss_hdemo_sk
> {noformat}
> If we look at PostgreSQL 11.1, we'll see that not only does the join filter, 
> but the table scan has the appropriate filters pushed to it.
> {noformat}
>  Finalize Aggregate  (cost=67549.69..67549.70 rows=1 width=32)
>    ->  Gather  (cost=67549.47..67549.68 rows=2 width=32)
>          Workers Planned: 2
>          ->  Partial Aggregate  (cost=66549.47..66549.48 rows=1 width=32)
>                ->  Hash Join  (cost=113.12..66547.64 rows=734 width=4)
>                      Hash Cond: (store_sales.ss_hdemo_sk = 
> household_demographics.hd_demo_sk)
>                      Join Filter: (((store_sales.ss_sales_price >= 
> '0'::numeric) AND 
>                                     (store_sales.ss_sales_price <= 
> '100'::numeric) AND 
>                                     (household_demographics.hd_dep_count = 
> 1)) OR 
>                                    ((store_sales.ss_sales_price >= 
> '100'::numeric) AND 
>                                     (store_sales.ss_sales_price <= 
> '200'::numeric) AND 
>                                     (household_demographics.hd_dep_count = 
> 2)))
>                      ->  Parallel Seq Scan on store_sales  
> (cost=0.00..66343.20 rows=7305 width=22)
>                            Filter: (((ss_sales_price >= '0'::numeric) AND 
> (ss_sales_price <= '100'::numeric)) OR 
>                                     ((ss_sales_price >= '100'::numeric) AND 
> (ss_sales_price <= '200'::numeric)))
>                      ->  Hash  (cost=112.62..112.62 rows=40 width=8)
>                            ->  Seq Scan on household_demographics  
> (cost=0.00..112.62 rows=40 width=8)
>                                  Filter: ((hd_dep_count = 1) OR (hd_dep_count 
> = 2))
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-8165) Planner does not push through predicates when there is a disjunction

Reply via email to