[ 
https://issues.apache.org/jira/browse/IMPALA-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-6258.
---------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

Fixed in 
https://github.com/apache/impala/commit/0eaab69fff82a62fbddaae8a0d4ee7a4302ee715

> Uninitialized tuple pointers in row batch for empty rows
> --------------------------------------------------------
>
>                 Key: IMPALA-6258
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6258
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Michael Ho
>            Assignee: Zoltán Borók-Nagy
>            Priority: Critical
>              Labels: correctness
>             Fix For: Impala 2.12.0
>
>
> During [code review|https://gerrit.cloudera.org/#/c/8623/] of IMPALA-6187, it 
> was noticed that the tuple pointers in the generated row batches may not be 
> initialized if a tuple has byte size 0. It's unclear if there may be edge 
> cases in which the code may be de-referencing these uninitialized tuple 
> pointers. In addition, there are some codes which compare these uninitialized 
> pointers agains the NULL value so having them uninitialized may return wrong 
> (and non-deterministic) results:
> {noformat}
> BooleanVal TupleIsNullPredicate::GetBooleanVal(
>     ScalarExprEvaluator* evaluator, const TupleRow* row) const {
>   int count = 0;
>   for (int i = 0; i < tuple_idxs_.size(); ++i) {
>     count += row->GetTuple(tuple_idxs_[i]) == NULL;
>   }
>   // Return true only if all originally specified tuples are NULL. Return 
> false if any
>   // tuple is non-nullable.
>   return BooleanVal(count == tuple_ids_.size());
> }
> {noformat}
> [~tarmstrong] came up with the following example:
> {noformat}
>   SELECT /* +straight_join */ COUNT(t1.id)
>   FROM functional.alltypessmall t1
>   LEFT OUTER JOIN (
>     SELECT /* +straight_join */ IFNULL(t2.int_col, 1) AS c
>     FROM functional.alltypessmall t2
>     LEFT OUTER JOIN functional.alltypestiny t3 ON t2.id < 1000
>   ) v ON t1.int_col = v.c;
> The relevant part of the plan is:
>     | 04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]                             
>             |
>     | |  hash predicates: t1.int_col = if(TupleIsNull(1, 2), NULL, 
> ifnull(t2.int_col, 1)) |
>     | |  fk/pk conjuncts: assumed fk/pk                                       
>             |
>     | |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB      
>             |
>     | |  tuple-ids=0,1N,2N row-size=16B cardinality=100                       
>             |
>     | |                                                                       
>             |
>     | |--08:EXCHANGE [HASH(if(TupleIsNull(1, 2), NULL, ifnull(t2.int_col, 
> 1)))]           |
>     | |  |  mem-estimate=0B mem-reservation=0B                                
>             |
>     | |  |  tuple-ids=1,2N row-size=8B cardinality=100                        
>             |
>     | |  |                                                                    
>             |
>     | |  F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                       
>             |
>     | |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B          
>             |
>     | |  03:NESTED LOOP JOIN [LEFT OUTER JOIN, BROADCAST]                     
>             |
>     | |  |  join predicates: t2.id < 1000                                     
>             |
>     | |  |  mem-estimate=0B mem-reservation=0B                                
>             |
>     | |  |  tuple-ids=1,2N row-size=8B cardinality=100                        
>             |
>     | |  |                                                                    
>             |
>     | |  |--06:EXCHANGE [BROADCAST]                                           
>             |
>     | |  |  |  mem-estimate=0B mem-reservation=0B                             
>             |
>     | |  |  |  tuple-ids=2 row-size=0B cardinality=8                          
>             |
>     | |  |  |                                                                 
>             |
>     | |  |  F02:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                    
>             |
>     | |  |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B       
>             |
>     | |  |  02:SCAN HDFS [functional.alltypestiny t3, RANDOM]                 
>             |
>     | |  |     partitions=4/4 files=4 size=460B                               
>             |
>     | |  |     stats-rows=8 extrapolated-rows=disabled                        
>             |
>     | |  |     table stats: rows=8 size=unavailable                           
>             |
>     | |  |     column stats: all                                              
>             |
>     | |  |     mem-estimate=32.00MB mem-reservation=0B                        
>             |
>     | |  |     tuple-ids=2 row-size=0B cardinality=8                          
>             |
>     | |  |                                                                    
>             |
>     | |  01:SCAN HDFS [functional.alltypessmall t2, RANDOM]                   
>             |
>     | |     partitions=4/4 files=4 size=6.32KB                                
>             |
>     | |     stats-rows=100 extrapolated-rows=disabled                         
>             |
>     | |     table stats: rows=100 size=unavailable                            
>             |
>     | |     column stats: all                                                 
>             |
>     | |     mem-estimate=32.00MB mem-reservation=0B                           
>             |
>     | |     tuple-ids=1 row-size=8B cardinality=100                           
>             |
>      
> {noformat}
> We should fix them by setting these empty tuples with a dummy non-NULL 
> pointer.
> Alex came up with this query that produces non-deterministic results 
> currently:
> {noformat}
> select count(v.x) from functional.alltypestiny t3 left outer join (select 
> true as x from functional.alltypestiny t1 left outer join 
> functional.alltypestiny t2 on (true)) v on (v.x = t3.bool_col) where 
> t3.bool_col = true;
> {noformat}
> {noformat}
> select count(v.x) from functional_kudu.alltypestiny t3 left outer join 
> (select true as x from functional_kudu.alltypestiny t1 left outer join 
> functional_kudu.alltypestiny t2 on (true)) v on (v.x = t3.bool_col) where 
> t3.bool_col = true;
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to