[ 
https://issues.apache.org/jira/browse/IMPALA-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259290#comment-17259290
 ] 

ASF subversion and git services commented on IMPALA-10182:
----------------------------------------------------------

Commit 49680559b0da843fbb6ff949d52c9d43f98364b1 in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4968055 ]

IMPALA-10182: Don't add inferred identity predicates to SELECT node

For an inferred equality predicates of type c1 = c2 if both sides
are referring to the same underlying tuple and slot, it is an identity
predicate which should not be evaluated by the SELECT node since it
will incorrectly eliminate NULL rows. This patch fixes the behavior.

Testing:
 - Added planner tests with base table and with outer join
 - Added runtime tests with base table and with outer join
 - Added planner test for IMPALA-9694 (same root cause)
 - Ran PlannerTest .. no other plans changed

Change-Id: I924044f582652dbc50085851cc639f3dee1cd1f4
Reviewed-on: http://gerrit.cloudera.org:8080/16917
Reviewed-by: Aman Sinha <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Rows with NULLs filtered out with duplicate columns in subquery select inside 
> UNION ALL
> ---------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10182
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10182
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Tim Armstrong
>            Assignee: Aman Sinha
>            Priority: Blocker
>              Labels: correctness
>
> Bug report from here - 
> https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415
> Repro:
> {noformat}
> create database if not exists as_adventure;
> use as_adventure;
> CREATE tABLE IF NOT EXISTS
>     as_adventure.t1 
>     ( 
>         productsubcategorykey INT, 
>         productline STRING);
> insert into t1 values (1,'l1');
> insert into t1 values (2,'l1');
> insert into t1 values (1,'l2');
> insert into t1 values (3,'l3');
> insert into t1 values (null,'');
> select * from t1; 
> SELECT
>     MIN(t_53.c_41)       c_41,
>     CAST(NULL AS DOUBLE) c_43,
>     CAST(NULL AS BIGINT) c_44,
>     t_53.c2              c2,
>     t_53.c3s0            c3s0,
>     t_53.c4              c4,
>     t_53.c5s0            c5s0
> FROM
>     (   SELECT
>             t.productsubcategorykey c_41,
>             t.productline           c2,
>             t.productline           c3s0,
>             t.productsubcategorykey c4,
>             t.productsubcategorykey c5s0
>         FROM
>             as_adventure.t1 t
>         WHERE
>             true
>         GROUP BY
>             2,
>             3,
>             4,
>             5 ) t_53
> GROUP BY
>     4,
>     5,
>     6,
>     7
>  
> UNION ALL
> SELECT
>     MIN(t_53.c_41)       c_41,
>     CAST(NULL AS DOUBLE) c_43,
>     CAST(NULL AS BIGINT) c_44,
>     t_53.c2              c2,
>     t_53.c3s0            c3s0,
>     t_53.c4              c4,
>     t_53.c5s0            c5s0
> FROM
>     (   SELECT
>             t.productsubcategorykey c_41,
>             t.productline           c2,
>             t.productline           c3s0,
>             t.productsubcategorykey c4,
>             t.productsubcategorykey c5s0
>         FROM
>             as_adventure.t1 t
>         WHERE
>             true
>         GROUP BY
>             2,
>             3,
>             4,
>             5 ) t_53
> GROUP BY
>     4,
>     5,
>     6,
>     7
> {noformat}
> Somewhat similar to IMPALA-7957 in that the inferred predicates from the 
> column equivalences get placed in a Select node. It's a bit different in that 
> the NULLs that are filtered out from the predicates come from the base table.
> {noformat}
> +------------------------------------------------------------------------------------------------+
> | Explain String                                                              
>                    |
> +------------------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6                
>                    |
> | Per-Host Resource Estimates: Memory=576MB                                   
>                    |
> | WARNING: The following tables are missing relevant table and/or column 
> statistics.             |
> | as_adventure.t1                                                             
>                    |
> |                                                                             
>                    |
> | PLAN-ROOT SINK                                                              
>                    |
> | |                                                                           
>                    |
> | 13:EXCHANGE [UNPARTITIONED]                                                 
>                    |
> | |                                                                           
>                    |
> | 00:UNION                                                                    
>                    |
> | |  row-size=52B cardinality=2                                               
>                    |
> | |                                                                           
>                    |
> | |--08:SELECT                                                                
>                    |
> | |  |  predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0                  
>                    |
> | |  |  row-size=36B cardinality=1                                            
>                    |
> | |  |                                                                        
>                    |
> | |  07:AGGREGATE [FINALIZE]                                                  
>                    |
> | |  |  output: min(t.productsubcategorykey)                                  
>                    |
> | |  |  group by: t.productline, t.productline, t.productsubcategorykey, 
> t.productsubcategorykey |
> | |  |  row-size=36B cardinality=2                                            
>                    |
> | |  |                                                                        
>                    |
> | |  12:AGGREGATE [FINALIZE]                                                  
>                    |
> | |  |  group by: t.productline, t.productsubcategorykey                      
>                    |
> | |  |  row-size=16B cardinality=2                                            
>                    |
> | |  |                                                                        
>                    |
> | |  11:EXCHANGE [HASH(t.productline,t.productsubcategorykey)]                
>                    |
> | |  |                                                                        
>                    |
> | |  06:AGGREGATE [STREAMING]                                                 
>                    |
> | |  |  group by: t.productline, t.productsubcategorykey                      
>                    |
> | |  |  row-size=16B cardinality=2                                            
>                    |
> | |  |                                                                        
>                    |
> | |  05:SCAN HDFS [as_adventure.t1 t]                                         
>                    |
> | |     HDFS partitions=1/1 files=5 size=24B                                  
>                    |
> | |     row-size=16B cardinality=2                                            
>                    |
> | |                                                                           
>                    |
> | 04:SELECT                                                                   
>                    |
> | |  predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0                     
>                    |
> | |  row-size=36B cardinality=1                                               
>                    |
> | |                                                                           
>                    |
> | 03:AGGREGATE [FINALIZE]                                                     
>                    |
> | |  output: min(t.productsubcategorykey)                                     
>                    |
> | |  group by: t.productline, t.productline, t.productsubcategorykey, 
> t.productsubcategorykey    |
> | |  row-size=36B cardinality=2                                               
>                    |
> | |                                                                           
>                    |
> | 10:AGGREGATE [FINALIZE]                                                     
>                    |
> | |  group by: t.productline, t.productsubcategorykey                         
>                    |
> | |  row-size=16B cardinality=2                                               
>                    |
> | |                                                                           
>                    |
> | 09:EXCHANGE [HASH(t.productline,t.productsubcategorykey)]                   
>                    |
> | |                                                                           
>                    |
> | 02:AGGREGATE [STREAMING]                                                    
>                    |
> | |  group by: t.productline, t.productsubcategorykey                         
>                    |
> | |  row-size=16B cardinality=2                                               
>                    |
> | |                                                                           
>                    |
> | 01:SCAN HDFS [as_adventure.t1 t]                                            
>                    |
> |    HDFS partitions=1/1 files=5 size=24B                                     
>                    |
> |    row-size=16B cardinality=2                                               
>                    |
> +------------------------------------------------------------------------------------------------+
> {noformat}
> This query is a simplified version of the original one:
> {noformat}
> SELECT
>     t_53.c4              c4,
>     t_53.c5s0            c5s0
> FROM
>     (   SELECT
>             t.productsubcategorykey c4,
>             t.productsubcategorykey c5s0
>         FROM
>             as_adventure.t1 t
>         GROUP BY
>             1,
>             2 ) t_53
> GROUP BY 1,2
> UNION ALL
> SELECT
>     t_53.c4              c4,
>     t_53.c5s0            c5s0
> FROM
>     (   SELECT
>             t.productsubcategorykey c4,
>             t.productsubcategorykey c5s0
>         FROM
>             as_adventure.t1 t
>         GROUP BY
>             1, 2 ) t_53
> GROUP BY 1, 2;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to