[ 
https://issues.apache.org/jira/browse/IMPALA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965505#comment-16965505
 ] 

Antoni Ivanov commented on IMPALA-1369:
---------------------------------------

Seems to still be an issue with the latest version of Impala. Performance of 
null-aware anti-join are comparable to those of the Nested loop join (way too 
slow).

> Slow query with NAAJs
> ---------------------
>
>                 Key: IMPALA-1369
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1369
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Perf Investigation
>    Affects Versions: Impala 2.0
>            Reporter: Ippokratis Pandis
>            Priority: Minor
>              Labels: performance
>
> With IMPALA-1351 the query below runs correctly, but it takes too much time, 
> a few hours in my dev box. I think because essentially the NAAJs become 
> something like nested loop joins. Postgres runs it in some low tenths of 
> minutes. 
> Below it is the query, and the explains of both Impala and Postgres.
> {code}
> SELECT COUNT(t2.month) AS int_col_1
> FROM alltypesagg t1 CROSS
> JOIN alltypesagg t2 FULL
> OUTER JOIN alltypestiny t3 ON t3.year = t2.int_col
> AND t3.int_col = t2.year
> WHERE t1.day NOT IN
>     (SELECT tt1.year AS int_col_1
>      FROM alltypes tt1
>      RIGHT JOIN alltypes tt2 ON tt2.month = tt1.year
>      WHERE t3.tinyint_col = tt2.tinyint_col
>        AND t3.bigint_col = tt2.bigint_col
>        AND t3.int_col = tt1.id)
>   AND t1.day NOT IN
>     (SELECT ttt1.tinyint_col AS tinyint_col_1
>      FROM alltypestiny ttt1
>      WHERE t1.tinyint_col = ttt1.month
>        AND ttt1.month IS NULL
>        AND ttt1.tinyint_col NOT IN
>          (SELECT tttt2.int_col AS int_col_1
>           FROM alltypesagg tttt1
>           RIGHT JOIN alltypes tttt2 ON tttt2.id = tttt1.year)
>        AND ttt1.month NOT IN
>          (SELECT MIN(tttt2.year + tttt1.smallint_col * tttt1.int_col - 
> tttt2.bigint_col) + MAX(tttt2.tinyint_col) AS bigint_col_1
>           FROM alltypestiny tttt1
>           INNER JOIN alltypestiny tttt2 ON tttt2.month = tttt1.smallint_col)
>        AND ttt1.month IS NOT NULL
>        AND t2.smallint_col = ttt1.bigint_col)
> {code}
> The Impala explain"
> {code}
> +--------------------------------------------------------------------------------------------------------------------------------------+
> | Explain String                                                              
>                                                          |
> +--------------------------------------------------------------------------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=1.66GB VCores=5                     
>                                                          |
> |                                                                             
>                                                          |
> | 36:AGGREGATE [FINALIZE]                                                     
>                                                          |
> | |  output: count:merge(t2.month)                                            
>                                                          |
> | |                                                                           
>                                                          |
> | 35:EXCHANGE [UNPARTITIONED]                                                 
>                                                          |
> | |                                                                           
>                                                          |
> | 20:AGGREGATE                                                                
>                                                          |
> | |  output: count(t2.month)                                                  
>                                                          |
> | |                                                                           
>                                                          |
> | 19:HASH JOIN [NULL AWARE LEFT ANTI JOIN, BROADCAST]                         
>                                                          |
> | |  hash predicates: t1.day = ttt1.tinyint_col                               
>                                                          |
> | |  other join predicates: t2.smallint_col = ttt1.bigint_col, t1.tinyint_col 
> = ttt1.month                                             |
> | |                                                                           
>                                                          |
> | |--34:EXCHANGE [BROADCAST]                                                  
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  15:HASH JOIN [NULL AWARE LEFT ANTI JOIN, BROADCAST]                      
>                                                          |
> | |  |  hash predicates: ttt1.month = min(tttt2.year + tttt1.smallint_col * 
> tttt1.int_col - tttt2.bigint_col) + max(tttt2.tinyint_col) |
> | |  |                                                                        
>                                                          |
> | |  |--33:EXCHANGE [BROADCAST]                                               
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  32:AGGREGATE [FINALIZE]                                               
>                                                          |
> | |  |  |  output: min:merge(tttt2.year + tttt1.smallint_col * tttt1.int_col 
> - tttt2.bigint_col), max:merge(tttt2.tinyint_col)         |
> | |  |  |                                                                     
>                                                          |
> | |  |  31:EXCHANGE [UNPARTITIONED]                                           
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  13:AGGREGATE                                                          
>                                                          |
> | |  |  |  output: min(tttt2.year + tttt1.smallint_col * tttt1.int_col - 
> tttt2.bigint_col), max(tttt2.tinyint_col)                     |
> | |  |  |                                                                     
>                                                          |
> | |  |  12:HASH JOIN [INNER JOIN, BROADCAST]                                  
>                                                          |
> | |  |  |  hash predicates: tttt2.month = tttt1.smallint_col                  
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  |--30:EXCHANGE [BROADCAST]                                            
>                                                          |
> | |  |  |  |                                                                  
>                                                          |
> | |  |  |  10:SCAN HDFS [functional.alltypestiny tttt1]                       
>                                                          |
> | |  |  |     partitions=4/4 size=460B                                        
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  11:SCAN HDFS [functional.alltypestiny tttt2]                          
>                                                          |
> | |  |     partitions=4/4 size=460B                                           
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  14:HASH JOIN [NULL AWARE LEFT ANTI JOIN, BROADCAST]                      
>                                                          |
> | |  |  hash predicates: ttt1.tinyint_col = tttt2.int_col                     
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  |--29:EXCHANGE [BROADCAST]                                               
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  09:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]                           
>                                                          |
> | |  |  |  hash predicates: tttt2.id = tttt1.year                             
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  |--28:EXCHANGE [HASH(tttt1.year)]                                     
>                                                          |
> | |  |  |  |                                                                  
>                                                          |
> | |  |  |  07:SCAN HDFS [functional.alltypesagg tttt1]                        
>                                                          |
> | |  |  |     partitions=11/11 size=814.73KB                                  
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  27:EXCHANGE [HASH(tttt2.id)]                                          
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  08:SCAN HDFS [functional.alltypes tttt2]                              
>                                                          |
> | |  |     partitions=24/24 size=478.45KB                                     
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  06:SCAN HDFS [functional.alltypestiny ttt1]                              
>                                                          |
> | |     partitions=0/4 size=0B                                                
>                                                          |
> | |                                                                           
>                                                          |
> | 18:HASH JOIN [NULL AWARE LEFT ANTI JOIN, BROADCAST]                         
>                                                          |
> | |  hash predicates: t1.day = tt1.year                                       
>                                                          |
> | |  other join predicates: t3.int_col = tt1.id, t3.bigint_col = 
> tt2.bigint_col, t3.tinyint_col = tt2.tinyint_col                      |
> | |                                                                           
>                                                          |
> | |--26:EXCHANGE [BROADCAST]                                                  
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  05:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]                              
>                                                          |
> | |  |  hash predicates: tt2.month = tt1.year                                 
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  |--25:EXCHANGE [HASH(tt1.year)]                                          
>                                                          |
> | |  |  |                                                                     
>                                                          |
> | |  |  03:SCAN HDFS [functional.alltypes tt1]                                
>                                                          |
> | |  |     partitions=24/24 size=478.45KB                                     
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  24:EXCHANGE [HASH(tt2.month)]                                            
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  04:SCAN HDFS [functional.alltypes tt2]                                   
>                                                          |
> | |     partitions=24/24 size=478.45KB                                        
>                                                          |
> | |                                                                           
>                                                          |
> | 17:HASH JOIN [FULL OUTER JOIN, PARTITIONED]                                 
>                                                          |
> | |  hash predicates: t2.int_col = t3.year, t2.year = t3.int_col              
>                                                          |
> | |                                                                           
>                                                          |
> | |--23:EXCHANGE [HASH(t3.year,t3.int_col)]                                   
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  02:SCAN HDFS [functional.alltypestiny t3]                                
>                                                          |
> | |     partitions=4/4 size=460B                                              
>                                                          |
> | |                                                                           
>                                                          |
> | 22:EXCHANGE [HASH(t2.int_col,t2.year)]                                      
>                                                          |
> | |                                                                           
>                                                          |
> | 16:CROSS JOIN [BROADCAST]                                                   
>                                                          |
> | |                                                                           
>                                                          |
> | |--21:EXCHANGE [BROADCAST]                                                  
>                                                          |
> | |  |                                                                        
>                                                          |
> | |  00:SCAN HDFS [functional.alltypesagg t1]                                 
>                                                          |
> | |     partitions=11/11 size=814.73KB                                        
>                                                          |
> | |                                                                           
>                                                          |
> | 01:SCAN HDFS [functional.alltypesagg t2]                                    
>                                                          |
> |    partitions=11/11 size=814.73KB                                           
>                                                          |
> +--------------------------------------------------------------------------------------------------------------------------------------+
> {code}
> And Postgres's simpler plan, which uses a NLIJ:
> {code}
> ------------
>  Aggregate  (cost=104609484.78..104609484.79 rows=1 width=4)
>    ->  Hash Full Join  (cost=1.20..104533859.78 rows=30250000 width=4)
>          Hash Cond: ((t2.int_col = t3.year) AND (t2.year = t3.int_col))
>          Filter: ((NOT (SubPlan 1)) AND (NOT (SubPlan 4)))
>          ->  Nested Loop  (cost=0.00..1513041.50 rows=121000000 width=23)
>                ->  Seq Scan on alltypesagg t1  (cost=0.00..257.00 rows=11000 
> width=9)
>                ->  Materialize  (cost=0.00..312.00 rows=11000 width=14)
>                      ->  Seq Scan on alltypesagg t2  (cost=0.00..257.00 
> rows=11000 width=14)
>          ->  Hash  (cost=1.08..1.08 rows=8 width=28)
>                ->  Seq Scan on alltypestiny t3  (cost=0.00..1.08 rows=8 
> width=28)
>          SubPlan 1
>            ->  Nested Loop  (cost=13.13..122.84 rows=1 width=4)
>                  ->  Index Scan using functional_alltypes_id on alltypes tt1  
> (cost=0.28..8.30 rows=1 width=4)
>                        Index Cond: (t3.int_col = id)
>                  ->  Bitmap Heap Scan on alltypes tt2  (cost=12.84..114.48 
> rows=6 width=4)
>                        Recheck Cond: (month = tt1.year)
>                        Filter: ((t3.tinyint_col = tinyint_col) AND 
> (t3.bigint_col = bigint_col))
>                        ->  Bitmap Index Scan on functional_alltypes_month  
> (cost=0.00..12.84 rows=608 width=0)
>                              Index Cond: (month = tt1.year)
>          SubPlan 4
>            ->  Seq Scan on alltypestiny ttt1  (cost=775.99..777.21 rows=1 
> width=12)
>                  Filter: ((month IS NULL) AND (month IS NOT NULL) AND 
> (t2.smallint_col = bigint_col) AND (NOT (hashed SubPlan 2)) AND (NOT (hashed 
> SubPlan 3)) AND (t1.tinyint_col = (month)
> ::numeric))
>                  SubPlan 2
>                    ->  Hash Right Join  (cost=255.25..746.00 rows=11000 
> width=4)
>                          Hash Cond: (tttt1.year = tttt2.id)
>                          ->  Seq Scan on alltypesagg tttt1  
> (cost=0.00..257.00 rows=11000 width=4)
>                          ->  Hash  (cost=164.00..164.00 rows=7300 width=8)
>                                ->  Seq Scan on alltypes tttt2  
> (cost=0.00..164.00 rows=7300 width=8)
>                  SubPlan 3
>                    ->  Aggregate  (cost=2.47..2.48 rows=1 width=30)
>                          ->  Hash Join  (cost=1.18..2.37 rows=8 width=30)
>                                Hash Cond: (tttt1_1.smallint_col = 
> tttt2_1.month)
>                                ->  Seq Scan on alltypestiny tttt1_1  
> (cost=0.00..1.08 rows=8 width=6)
>                                ->  Hash  (cost=1.08..1.08 rows=8 width=28)
>                                      ->  Seq Scan on alltypestiny tttt2_1  
> (cost=0.00..1.08 rows=8 width=28)
> (35 rows)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to