[ 
https://issues.apache.org/jira/browse/IMPALA-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-7785:
--------------------------------
    Description: 
The FE fails to analyze a {{GROUP BY}} clause prior to invoking the rewrite 
rules, causing the rules to fail to do any rewrites.

For the {{SELECT}} list, the analyzer processes each expression and marks it as 
analyzed.

The rewrite rules, however, tend to skip unanalyzed nodes. (And, according to 
IMPALA-7754, often are not re-analyzed after a rewrite.)

Consider this simple query:

{code:sql}
SELECT case when string_col is not null then string_col else 'foo' end          
                          
FROM functional.alltypestiny                         
GROUP BY case when string_col is not null then string_col else 'foo' end        
                             
{code}

This query works. Now, using the new feature in IMPALA-7655 with a query that 
will be rewritten to the above:

{code:sql}
SELECT coalesce(string_col, 'foo')                                    
FROM functional.alltypes                                                  
GROUP BY coalesce(string_col, 'foo')                                         
{code}

The above is rewritten using the new conditional function rewrite rules. Result:

{noformat}
org.apache.impala.common.AnalysisException:
  select list expression not produced by aggregation output
  (missing from GROUP BY clause?):
  CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
{noformat}

The reason is the check used in multiple rewrite rules:

{code:java}
  public Expr apply(Expr expr, Analyzer analyzer) throws AnalysisException {    
          
    if (!expr.isAnalyzed()) return expr;                                        
          
{code}

Step though the code. The {{coalesce()}} expression in the {{SELECT}} clause is 
analyzed, the one in the {{GROUP BY}} is not. This creates a problem because 
SQL semantics require the identical expression in both clause for them to 
match. (It also means no other rewrite rules, at least not those with this 
check, are invoked, leading to an unintended code path.)

This query makes it a bit clearer:

{code:sql}
SELECT 1 + 2
FROM functional.alltypestiny
GROUP BY 1 + 2
{code}

This works. But, if we use test code to inspect the "rewritten" {{GROUP BY}}, 
we find that it is still at "1 + 2" while the {{SELECT}} expression has been 
rewritten to "3".

Seems that, when working with rewrites, we must be very careful because, as the 
code currently is written, we rewrite some clauses but not others. Then, we 
have to know when it is safe to have the SELECT clause differ from the GROUP BY 
clause. (Looks like it is OK for constants to differ, but not for functions...)

VERY confusing, would be better to just fix the darn thing.

  was:
The FE cannot handle a {{CASE}} statement in a {{GROUP BY}} clause. As a 
result, the change in IMPALA-7655 cannot be applied to queries with such a 
clause for fear of ending up in the situation shown later.

Consider this simple query:

{code:sql}
SELECT case when string_col is not null then string_col else 'foo' end          
                          
FROM functional.alltypestiny                         
GROUP BY case when string_col is not null then string_col else 'foo' end        
                             
{code}

The above will fail with the following:

{noformat}
 org.apache.impala.common.AnalysisException:
 select list expression not produced by aggregation output
 (missing from GROUP BY clause?)    :
 CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
{noformat}

This then causes the rewrites in IMPALA-7655 to fail:

{code:sql}
SELECT coalesce(string_col, 'foo')                                    
FROM functional.alltypes                                                  
GROUP BY coalesce(string_col, 'foo')                                         
{code}

The above is rewritten using the new conditional function rewrite rules. Result:

{noformat}
org.apache.impala.common.AnalysisException:
  select list expression not produced by aggregation output
  (missing from GROUP BY clause?):
  CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
{noformat}

        Summary: GROUP BY clause not analyzed prior to rewrite step  (was: 
GROUP BY clause cannot contain a CASE statement)

> GROUP BY clause not analyzed prior to rewrite step
> --------------------------------------------------
>
>                 Key: IMPALA-7785
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7785
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> The FE fails to analyze a {{GROUP BY}} clause prior to invoking the rewrite 
> rules, causing the rules to fail to do any rewrites.
> For the {{SELECT}} list, the analyzer processes each expression and marks it 
> as analyzed.
> The rewrite rules, however, tend to skip unanalyzed nodes. (And, according to 
> IMPALA-7754, often are not re-analyzed after a rewrite.)
> Consider this simple query:
> {code:sql}
> SELECT case when string_col is not null then string_col else 'foo' end        
>                             
> FROM functional.alltypestiny                         
> GROUP BY case when string_col is not null then string_col else 'foo' end      
>                                
> {code}
> This query works. Now, using the new feature in IMPALA-7655 with a query that 
> will be rewritten to the above:
> {code:sql}
> SELECT coalesce(string_col, 'foo')                                    
> FROM functional.alltypes                                                  
> GROUP BY coalesce(string_col, 'foo')                                         
> {code}
> The above is rewritten using the new conditional function rewrite rules. 
> Result:
> {noformat}
> org.apache.impala.common.AnalysisException:
>   select list expression not produced by aggregation output
>   (missing from GROUP BY clause?):
>   CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
> {noformat}
> The reason is the check used in multiple rewrite rules:
> {code:java}
>   public Expr apply(Expr expr, Analyzer analyzer) throws AnalysisException {  
>             
>     if (!expr.isAnalyzed()) return expr;                                      
>             
> {code}
> Step though the code. The {{coalesce()}} expression in the {{SELECT}} clause 
> is analyzed, the one in the {{GROUP BY}} is not. This creates a problem 
> because SQL semantics require the identical expression in both clause for 
> them to match. (It also means no other rewrite rules, at least not those with 
> this check, are invoked, leading to an unintended code path.)
> This query makes it a bit clearer:
> {code:sql}
> SELECT 1 + 2
> FROM functional.alltypestiny
> GROUP BY 1 + 2
> {code}
> This works. But, if we use test code to inspect the "rewritten" {{GROUP BY}}, 
> we find that it is still at "1 + 2" while the {{SELECT}} expression has been 
> rewritten to "3".
> Seems that, when working with rewrites, we must be very careful because, as 
> the code currently is written, we rewrite some clauses but not others. Then, 
> we have to know when it is safe to have the SELECT clause differ from the 
> GROUP BY clause. (Looks like it is OK for constants to differ, but not for 
> functions...)
> VERY confusing, would be better to just fix the darn thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to