[ 
https://issues.apache.org/jira/browse/IMPALA-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908134#comment-16908134
 ] 

Quanlong Huang commented on IMPALA-8790:
----------------------------------------

We can work around the bug by rewriting the query to avoid putting the analytic 
functions and the aggregations in the same query block:
{code:sql}
explain select uid, cid,
   rank() over (partition by uid order by cnt desc)
from (select uid, cid, count(*) as cnt from foo) w;
{code}
Impala has an optimization for analytic functions if its query block contains 
group-by expressions of aggregations in the same time. It will trim the 
exchange slots according to the partition-by and the group-by slots. See this 
[commit|https://github.com/apache/impala/commit/0b3124ab35402f1ab8141e8ffcca28a5a481c81e]
 for more details.

In IMPALA-110, there's a bug using wrong group-by expressions referencing the 
non-materialized slots (of the inline view) in this optimization. For the 
rewritten query, it avoids putting the analytic function rank() and the 
aggregation (count\(*\) group by ...) in the same query block. So the above 
optimization is bypassed, which avoids the bug.

Patch for review: https://gerrit.cloudera.org/c/14063

> IllegalStateException: Illegal reference to non-materialized slot
> -----------------------------------------------------------------
>
>                 Key: IMPALA-8790
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8790
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: foo.parq
>
>
> Reproduce:
> {code:sql}
> $ hdfs dfs -put foo.parq hdfs:///tmp
> impala> create table foo (uid string, cid string) stored as parquet;
> impala> load data inpath 'hdfs:///tmp/foo.parq' into table foo;
> {code}
> With the stats, the following query hits an IllegalStateException:
> {code:sql}
> impala> compute stats foo;
> impala> explain select uid, cid,
>    rank() over (partition by uid order by count(*) desc)
> from (select uid, cid from foo) w
> group by uid, cid;
> ERROR: IllegalStateException: Illegal reference to non-materialized slot: 
> tid=1 sid=2{code}
> Without the stats, it runs successfully:
> {code:sql}
> impala> drop stats foo;
> impala> explain select uid, cid,
>    rank() over (partition by uid order by count(*) desc)
> from (select uid, cid from foo) w
> group by uid, cid;
> +------------------------------------------------------------------------------------+
> | Explain String                                                              
>        |
> +------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=84.02MB Threads=5                 
>        |
> | Per-Host Resource Estimates: Memory=304MB                                   
>        |
> | WARNING: The following tables are missing relevant table and/or column 
> statistics. |
> | common_action.foo                                                           
>        |
> |                                                                             
>        |
> | PLAN-ROOT SINK                                                              
>        |
> | |                                                                           
>        |
> | 07:EXCHANGE [UNPARTITIONED]                                                 
>        |
> | |                                                                           
>        |
> | 03:ANALYTIC                                                                 
>        |
> | |  functions: rank()                                                        
>        |
> | |  partition by: uid                                                        
>        |
> | |  order by: count(*) DESC                                                  
>        |
> | |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                
>        |
> | |  row-size=40B cardinality=1.10K                                           
>        |
> | |                                                                           
>        |
> | 02:SORT                                                                     
>        |
> | |  order by: uid ASC NULLS FIRST, count(*) DESC                             
>        |
> | |  row-size=32B cardinality=1.10K                                           
>        |
> | |                                                                           
>        |
> | 06:EXCHANGE [HASH(uid)]                                                     
>        |
> | |                                                                           
>        |
> | 05:AGGREGATE [FINALIZE]                                                     
>        |
> | |  output: count:merge(*)                                                   
>        |
> | |  group by: uid, cid                                                       
>        |
> | |  row-size=32B cardinality=1.10K                                           
>        |
> | |                                                                           
>        |
> | 04:EXCHANGE [HASH(uid,cid)]                                                 
>        |
> | |                                                                           
>        |
> | 01:AGGREGATE [STREAMING]                                                    
>        |
> | |  output: count(*)                                                         
>        |
> | |  group by: uid, cid                                                       
>        |
> | |  row-size=32B cardinality=1.10K                                           
>        |
> | |                                                                           
>        |
> | 00:SCAN HDFS [common_action.foo]                                            
>        |
> |    HDFS partitions=1/1 files=1 size=5.19KB                                  
>        |
> |    row-size=24B cardinality=1.10K                                           
>        |
> +------------------------------------------------------------------------------------+
> Fetched 37 row(s) in 0.03s
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to