[ https://issues.apache.org/jira/browse/IMPALA-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669417#comment-16669417 ]
Philip Zeyliger commented on IMPALA-7785: ----------------------------------------- Yep, definitely makes sense. Definitely a bug. {{GROUP BY 1}} does seem to work, which isn't totally surprising. If you have access to customer behavior, it'd be interesting to see if people set "EXPR_REWRITES=0" to work around these issues as a barometer of priority. > GROUP BY clause not analyzed prior to rewrite step > -------------------------------------------------- > > Key: IMPALA-7785 > URL: https://issues.apache.org/jira/browse/IMPALA-7785 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.0 > Reporter: Paul Rogers > Priority: Minor > > The FE fails to analyze a {{GROUP BY}} clause prior to invoking the rewrite > rules, causing the rules to fail to do any rewrites. > For the {{SELECT}} list, the analyzer processes each expression and marks it > as analyzed. > The rewrite rules, however, tend to skip unanalyzed nodes. (And, according to > IMPALA-7754, often are not re-analyzed after a rewrite.) > Consider this simple query: > {code:sql} > SELECT case when string_col is not null then string_col else 'foo' end > > FROM functional.alltypestiny > GROUP BY case when string_col is not null then string_col else 'foo' end > > {code} > This query works. Now, using the new feature in IMPALA-7655 with a query that > will be rewritten to the above: > {code:sql} > SELECT coalesce(string_col, 'foo') > FROM functional.alltypes > GROUP BY coalesce(string_col, 'foo') > {code} > The above is rewritten using the new conditional function rewrite rules. > Result: > {noformat} > org.apache.impala.common.AnalysisException: > select list expression not produced by aggregation output > (missing from GROUP BY clause?): > CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END > {noformat} > The reason is the check used in multiple rewrite rules: > {code:java} > public Expr apply(Expr expr, Analyzer analyzer) throws AnalysisException { > > if (!expr.isAnalyzed()) return expr; > > {code} > Step though the code. The {{coalesce()}} expression in the {{SELECT}} clause > is analyzed, the one in the {{GROUP BY}} is not. This creates a problem > because SQL semantics require the identical expression in both clause for > them to match. (It also means no other rewrite rules, at least not those with > this check, are invoked, leading to an unintended code path.) > This query makes it a bit clearer: > {code:sql} > SELECT 1 + 2 > FROM functional.alltypestiny > GROUP BY 1 + 2 > {code} > This works. But, if we use test code to inspect the "rewritten" {{GROUP BY}}, > we find that it is still at "1 + 2" while the {{SELECT}} expression has been > rewritten to "3". > Seems that, when working with rewrites, we must be very careful because, as > the code currently is written, we rewrite some clauses but not others. Then, > we have to know when it is safe to have the SELECT clause differ from the > GROUP BY clause. (Looks like it is OK for constants to differ, but not for > functions...) > VERY confusing, would be better to just fix the darn thing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org