[ 
https://issues.apache.org/jira/browse/IMPALA-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated IMPALA-10064:
--------------------------------
    Description: 
Consider the following table schema, view and 2 queries on the view:
{noformat}
create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date);
create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as 
date));

// query 1:  (Good) constant on ts gets propagated
explain select * from tt1_view where ts = '2019-07-01';
00:SCAN HDFS [db1.tt1]
   partition predicates: mydate = DATE '2019-07-01'
   HDFS partitions=1/3 files=2 size=48B
   predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00'
   row-size=24B cardinality=1

// query 2: (Not good) constant on ts does not get propagated
explain select * from tt1_view where ts > '2019-07-01';
00:SCAN HDFS [db1.tt1]
   HDFS partitions=3/3 files=4 size=96B
   predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts 
AS DATE)
   row-size=28B cardinality=1

{noformat}

Note that in query 1, with the equality condition on 'ts' the constant value is 
propagated to the 'mydate = CAST(ts as date)' predicate.  This gets applied as 
a partition predicate.  Whereas, in query 2 which has a range predicate, the 
constant is not propagated and no partition predicate is created for the scan.  
We should support the second case also for constant propagation.  The constant 
predicates such as >, >=. <. <= and involving date or timestamp literals should 
be considered ..but we have to analyze the cases where the propagation is 
valid.  E.g with date_add, date_diff type of functions is there a potential for 
incorrect propagation.

Note that a predicate can be a BETWEEN condition such as:
{noformat}
WHERE ts >= '2019-07-01' AND ts <= '2020--07-01'
{noformat}
In this case both need to be applied 




  was:
Consider the following table schema, view and 2 queries on the view:
{noformat}
create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date);
create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as 
date));

// query 1:  (Good) constant on ts gets propagated
explain select * from tt1_view where ts = '2019-07-01';
00:SCAN HDFS [db1.tt1]
   partition predicates: mydate = DATE '2019-07-01'
   HDFS partitions=1/3 files=2 size=48B
   predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00'
   row-size=24B cardinality=1

// query 2: (Not good) constant on ts does not get propagated
explain select * from tt1_view where ts > '2019-07-01';
00:SCAN HDFS [db1.tt1]
   HDFS partitions=3/3 files=4 size=96B
   predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts 
AS DATE)
   row-size=28B cardinality=1

{noformat}

Note that in query 1, with the equality condition on 'ts' the constant value is 
propagated to the 'mydate = CAST(ts as date)' predicate.  This gets applied as 
a partition predicate.  Whereas, in query 2 which has a range predicate, the 
constant is not propagated and no partition predicate is created for the scan.  
We should support the second case also for constant propagation.  The constant 
predicates such as >, >=. <. <= and involving numeric or date literal should be 
considered.

Note that a predicate can be a BETWEEN condition such as:
{noformat}
WHERE ts >= '2019-07-01' AND ts <= '2020--07-01'
{noformat}
In this case both need to be applied 





> Support constant propagation for range predicates
> -------------------------------------------------
>
>                 Key: IMPALA-10064
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10064
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>
> Consider the following table schema, view and 2 queries on the view:
> {noformat}
> create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date);
> create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as 
> date));
> // query 1:  (Good) constant on ts gets propagated
> explain select * from tt1_view where ts = '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>    partition predicates: mydate = DATE '2019-07-01'
>    HDFS partitions=1/3 files=2 size=48B
>    predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00'
>    row-size=24B cardinality=1
> // query 2: (Not good) constant on ts does not get propagated
> explain select * from tt1_view where ts > '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>    HDFS partitions=3/3 files=4 size=96B
>    predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts 
> AS DATE)
>    row-size=28B cardinality=1
> {noformat}
> Note that in query 1, with the equality condition on 'ts' the constant value 
> is propagated to the 'mydate = CAST(ts as date)' predicate.  This gets 
> applied as a partition predicate.  Whereas, in query 2 which has a range 
> predicate, the constant is not propagated and no partition predicate is 
> created for the scan.  We should support the second case also for constant 
> propagation.  The constant predicates such as >, >=. <. <= and involving date 
> or timestamp literals should be considered ..but we have to analyze the cases 
> where the propagation is valid.  E.g with date_add, date_diff type of 
> functions is there a potential for incorrect propagation.
> Note that a predicate can be a BETWEEN condition such as:
> {noformat}
> WHERE ts >= '2019-07-01' AND ts <= '2020--07-01'
> {noformat}
> In this case both need to be applied 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to