[
https://issues.apache.org/jira/browse/HIVE-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850659#comment-16850659
]
Oleksiy Sayankin commented on HIVE-21802:
-----------------------------------------
*ROOT-CAUSE*
This is side effect of
[HIVE-14861|https://issues.apache.org/jira/browse/HIVE-14861] {{Support
precedence for set operator using parentheses}} which is a part of common task
[HIVE-12764|https://issues.apache.org/jira/browse/HIVE-12764] {{Support
Intersect (distinct/all) Except (distinct/all) Minus (distinct/all) in Hive}}.
There was no intentional idea to move {{ORDER BY}} clause before {{WINDOW}}
clause in Hive parser, but it was done in scope of the
[HIVE-14861|https://issues.apache.org/jira/browse/HIVE-14861] in Apache
Hive-2.2.0. Compare two parsers definition used for automatic java code
generation:
\\
\\
* Hive-1.2
{code}
selectStatement[boolean topLevel]
:
(
s=selectClause
f=fromClause?
w=whereClause?
g=groupByClause?
h=havingClause?
o=orderByClause?
c=clusterByClause?
d=distributeByClause?
sort=sortByClause?
win=window_clause?
{code}
* Hive-2.3
{code}
atomSelectStatement
:
s=selectClause
f=fromClause?
w=whereClause?
g=groupByClause?
h=havingClause?
o=orderByClause?
win=window_clause?
-> ^(TOK_QUERY $f? ^(TOK_INSERT ^(TOK_DESTINATION ^(TOK_DIR TOK_TMP_FILE))
$s $w? $g? $h? $win?))
|
LPAREN! selectStatement RPAREN!
;
selectStatement
:
a=atomSelectStatement
set=setOpSelectStatement[$atomSelectStatement.tree]?
o=orderByClause?
{code}
You see that {{o=orderByClause?}} is placed _before_ {{win=window_clause?}} in
Hive-1.2 in {{selectStatement}} and this makes the query work in Hive-1.2.
But in Hive-2.3 {{selectStatement}} was split into two parts:
{{atomSelectStatement}} and {{selectStatement}} and {{o=orderByClause?}} clause
is placed after {{a=atomSelectStatement}} which in its own turn contains
{{win=window_clause?}}
*SOLUTION*
Add additional {{o=orderByClause?}} before {{win=window_clause?}} in
{{atomSelectStatement}} to make Hive parser understand {{ORDER BY}} before
{{WINDOW}}. I did not remove {{o=orderByClause?}} from {{selectStatement}} in
order not to break backward compatibility.
*EFFECTS*
1. Hive Parser ({{ORDER BY}} clause)
2. Hive {{WINDOW}} functions
3. Precedence for set operator using parentheses
> Unexpected change in HiveQL clause order
> ----------------------------------------
>
> Key: HIVE-21802
> URL: https://issues.apache.org/jira/browse/HIVE-21802
> Project: Hive
> Issue Type: Bug
> Components: Parser, Query Processor
> Reporter: Oleksiy Sayankin
> Assignee: Oleksiy Sayankin
> Priority: Critical
>
> This query worked in Hive 1.2 ( ({{ORDER}} clause _before_ {{WINDOW}})):
> {code:java}
> CREATE TABLE ccdp_v02 AS
> SELECT * from
> (select
> cust_xref_id,
> cstone_last_updatetm,
> instal_late_pay_ct,
> ROW_NUMBER() over w1 as RN,
> a.report_dt
> from cstonedb3.gam_ccdp_us a where report_dt = '2019-05-01'
> and cust_xref_id in (1234)
> order by cust_xref_id, a.report_dt, cstone_last_updatetm desc
> WINDOW w1 as (partition by a.cust_xref_id, a.report_dt order by
> a.cstone_last_updatetm desc)
> ) tmp where RN=1 DISTRIBUTE BY report_dt;
> {code}
> In Hive2.1 it fails with:
> {code:java}
> hive> SELECT id
> > FROM (
> > SELECT
> > id,
> > a1,
> > ROW_NUMBER() OVER w1 AS RN,
> > b1
> > FROM i a
> > ORDER BY id, b1, a1 DESC
> > WINDOW w1 as (PARTITION BY id, b1 ORDER BY a1 DESC)
> > );
> NoViableAltException(257@[])
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.atomjoinSource(HiveParser_FromClauseParser.java:2269)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:2479)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:1692)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1313)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:42092)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:36765)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:37017)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36663)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35852)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35740)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2307)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1335)
> at
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
> at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
> at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:774)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:692)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 3:4 cannot recognize input near '(' 'SELECT' 'id'
> in joinSource
> hive>
> {code}
> *STEPS TO REPRODUCE:*
> 1. Create a table:
> {code:java}
> CREATE TABLE i (id INT, a1 INT, b1 BOOLEAN);
> {code}
> 2. Run the query which was working in Hive-1.2: ({{ORDER}} clause _before_
> {{WINDOW}})
> {code:java}
> SELECT id
> FROM (
> SELECT
> id,
> a1,
> ROW_NUMBER() OVER w1 AS rn,
> b1
> FROM i a
> ORDER BY id, b1, a1 DESC
> WINDOW w1 as (PARTITION BY id, b1 ORDER BY a1 DESC)
> ) tmp WHERE rn=1 DISTRIBUTE BY id;
> {code}
> *ACTUAL RESULT:*
> The query fails with an exception you can find above.
> The query from Step 2 which works for Hive-2.3 is ( ({{ORDER}} clause _after_
> {{WINDOW}})):
> {code:java}
> SELECT id
> FROM (
> SELECT
> id,
> a1,
> ROW_NUMBER() OVER w1 AS rn,
> b1
> FROM i a
> WINDOW w1 as (PARTITION BY id, b1 ORDER BY a1 DESC)
> ORDER BY id, b1, a1 DESC
> ) tmp WHERE rn=1 DISTRIBUTE BY id;
> {code}
> Hive-2.3 also fails to parse subquery ( ({{ORDER}} clause _before_
> {{WINDOW}})):
> {code:java}
> SELECT
> id,
> a1,
> ROW_NUMBER() OVER w1 AS rn,
> b1
> FROM i a
> ORDER BY id, b1, a1 DESC
> WINDOW w1 as (PARTITION BY id, b1 ORDER BY a1 DESC)
> {code}
> Customer is facing issue regularly this is occurring after upgrade . the
> workaround given by us need lot of code changes which will get them back to
> squire one they need to follow process and will cost them a lot.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)