[
https://issues.apache.org/jira/browse/HIVE-27649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758701#comment-17758701
]
Nicolas Richard commented on HIVE-27649:
----------------------------------------
I did some investigation to figure out what was going on. In
https://issues.apache.org/jira/browse/HIVE-21980, the grammar [changed a
bit|https://github.com/apache/hive/commit/0f39030c3d33b11ae9c14ac81e047b44e8695371]
from:
{code:java}
atomjoinSource
@init { gParent.pushMsg("joinSource", state); }
@after { gParent.popMsg(state); }
:
tableSource (lateralView^)*
|
virtualTableSource (lateralView^)*
|
(subQuerySource) => subQuerySource (lateralView^)*
|
partitionedTableFunction (lateralView^)*
|
LPAREN! joinSource RPAREN!
;{code}
to
{code:java}
atomjoinSource
@init { gParent.pushMsg("joinSource", state); }
@after { gParent.popMsg(state); }
: tableSource (lateralView^)*
| virtualTableSource (lateralView^)*
| (LPAREN (KW_WITH|KW_SELECT|KW_MAP|KW_REDUCE|KW_FROM)) => subQuerySource
(lateralView^)*
| (LPAREN LPAREN atomSelectStatement RPAREN setOperator ) =>
subQuerySource (lateralView^)*
| partitionedTableFunction (lateralView^)*
| LPAREN! joinSource RPAREN!
;{code}
When the query is parsed, we end up in the subQuerySource rule because
atomSelectStatement, by definition, cannot contain SORT BY, CLUSTER BY,
DISTRIBUTE BY or LIMIT clauses. An exception is thrown because subQuerySource
requires an identifier which is not present nor needed in this particular
scenario.
I tested it locally and changing _atomSelectStatement_ to _selectStatement_
solves the issue. However, I still need to validate that it does not have
side-effects by running the whole test suite.
> Subqueries with a set operator do not support order by clauses
> --------------------------------------------------------------
>
> Key: HIVE-27649
> URL: https://issues.apache.org/jira/browse/HIVE-27649
> Project: Hive
> Issue Type: Bug
> Components: Parser
> Affects Versions: 3.1.2, 4.0.0
> Reporter: Nicolas Richard
> Priority: Major
>
> Consider the following query:
> {code:java}
> select key from ((select key from src order by key) union (select key from
> src))subq {code}
> Up until 3.1.2, Hive would parse this query without any problems. However, if
> you try it on the latest versions, you'll get the following exception:
> {code:java}
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:60 cannot recognize
> input near 'union' '(' 'select' in subquery source
> at
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125)
> at
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:97) {code}
> With the inner exception stack trace being:
> {code:java}
> NoViableAltException(367@[])
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:14006)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45086)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.subQuerySource(HiveParser_FromClauseParser.java:5411)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.atomjoinSource(HiveParser_FromClauseParser.java:1921)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:2175)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.atomjoinSource(HiveParser_FromClauseParser.java:2110)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:2175)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:1750)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1593)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45094)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:38538)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:38831)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:38424)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:37686)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:37574)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2757)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.explainStatement(HiveParser.java:1751)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1614)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:123)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:97)
> {code}
> Note that this behavior also happens if the subquery contains a SORT BY,
> CLUSTER BY, DISTRIBUTE BY or LIMIT clause.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)