[jira] [Commented] (IMPALA-11030) Wrong result due to predicate pushdown into inline view with Analytic function

ASF subversion and git services (Jira) Mon, 10 Jan 2022 16:03:05 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472367#comment-17472367
 ]


ASF subversion and git services commented on IMPALA-11030:
----------------------------------------------------------

Commit 6894085e4e280d728957e8b9fe175be23bfc6ec0 in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6894085 ]

IMPALA-11030: Fix incorrect creation of common partition exprs

When there are 2 or more analytic functions in an inline view
and at least one of them does not have a partition-by expr,
we were previously still populating the commonPartitionExprs
list in AnalyticInfo. This common partition expr was then
used during the auxiliary predicate creation when the outer
query has a predicate on partition-by column. This leads to
wrong result because the auxiliary predicate is pushed down
to the table scan. While pushing down predicate on a
partitioning column is okay if all the analytic functions
contain that partitioning column, it is not correct to do
this when at least one analytic function does not have that
partitioning column.

This patch fixes the wrong result by ensuring that the
AnalyticInfo's commonPartitionExprs is empty if at least
one analytic function does not have partitioning exprs.

Testing:
 - Added new planner test and e2e test for row_num
   analytic function

Change-Id: Iebb51f691e8e5459ffbaf5a49907140f2de212cc
Reviewed-on: http://gerrit.cloudera.org:8080/18072
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Aman Sinha <[email protected]>


> Wrong result due to predicate pushdown into inline view with Analytic function
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-11030
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11030
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>
> Table DDL and population:
> {noformat}
> create table t1( c1 int, c2 char(1));
> insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as 
> char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as 
> char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as 
> char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P' 
> as char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N' 
> as char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17, 
> cast('P' as char(1))),(18, cast('N' as char(1))),(19, cast('P' as 
> char(1))),(20, cast('N' as char(1)));
> {noformat}
> {noformat}
> default> select * from t1;
> ------+
> c1    c2
> ------+
> 11    P
> 12    N
> 13    P
> 14    N
> 15    N
> 16    N
> 17    P
> 18    N
> 19    P
> 20    N
> 1     P
> 2     P
> 3     P
> 4     N
> 5     P
> 6     N
> 7     P
> 8     N
> 9     N
> 10    N
> ------+
> The following query produces a wrong num_row() for num_ranks column.
> default> select * from (select c1, c2 , row_number() over(order by c1) as 
> num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from 
> t1) a where c2='P';
> -------------------------+
> c1    c2      num_ranks       prime_rank
> -------------------------+
> 1     P       1       1
> 2     P       2       2
> 3     P       3       3
> 5     P       4       4
> 7     P       5       5
> 11    P       6       6
> 13    P       7       7
> 17    P       8       8
> 19    P       9       9
> -------------------------+
> {noformat}
> The plan indicates that the predicate c2='P' is incorrectly pushed to the 
> scan and affects the order of operations in the SQL statement.
> {noformat}
> Query: explain select * from (select c1, c2 , row_number() over(order by c1) 
> as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank 
> from t1) a where c2='P'
> +------------------------------------------------------------------------------------------+
> | Explain String                                                              
>              |
> +------------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=20.00MB Threads=2                 
>              |
> | Per-Host Resource Estimates: Memory=30MB                                    
>              |
> | Codegen disabled by planner                                                 
>              |
> | Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY 
> c1             |
> | ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC) 
> prime_rank           |
> | FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P'                         
>              |
> |                                                                             
>              |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
>              |
> | |  Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB 
> thread-reservation=2 |
> | PLAN-ROOT SINK                                                              
>              |
> | |  output exprs: c1, c2, row_number(), row_number()                         
>              |
> | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |                                                                           
>              |
> | 04:ANALYTIC                                                                 
>              |
> | |  functions: row_number()                                                  
>              |
> | |  order by: c1 ASC                                                         
>              |
> | |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                 
>              |
> | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |  tuple-ids=7,3 row-size=21B cardinality=10                                
>              |
> | |  in pipelines: 03(GETNEXT)                                                
>              |
> | |                                                                           
>              |
> | 03:SORT                                                                     
>              |
> | |  order by: c1 ASC                                                         
>              |
> | |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |  tuple-ids=7 row-size=13B cardinality=10                                  
>              |
> | |  in pipelines: 03(GETNEXT), 01(OPEN)                                      
>              |
> | |                                                                           
>              |
> | 02:ANALYTIC                                                                 
>              |
> | |  functions: row_number()                                                  
>              |
> | |  partition by: c2                                                         
>              |
> | |  order by: c1 ASC                                                         
>              |
> | |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                 
>              |
> | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |  tuple-ids=5,4 row-size=13B cardinality=10                                
>              |
> | |  in pipelines: 01(GETNEXT)                                                
>              |
> | |                                                                           
>              |
> | 01:SORT                                                                     
>              |
> | |  order by: c2 ASC NULLS LAST, c1 ASC                                      
>              |
> | |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |  tuple-ids=5 row-size=5B cardinality=10                                   
>              |
> | |  in pipelines: 01(GETNEXT), 00(OPEN)                                      
>              |
> | |                                                                           
>              |
> | 00:SCAN HDFS [default.t1]                                                   
>              |
> |    HDFS partitions=1/1 files=2 size=91B                                     
>              |
> |    predicates: CAST(default.t1.c2 AS STRING) = 'P'                          
>              |
> |    stored statistics:                                                       
>              |
> |      table: rows=20 size=91B                                                
>              |
> |      columns: all                                                           
>              |
> |    extrapolated-rows=disabled max-scan-range-rows=10                        
>              |
> |    mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1         
>              |
> |    tuple-ids=0 row-size=5B cardinality=10                                   
>              |
> |    in pipelines: 00(GETNEXT)                                                
>              |
> +------------------------------------------------------------------------------------------+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-11030) Wrong result due to predicate pushdown into inline view with Analytic function

Reply via email to