[
https://issues.apache.org/jira/browse/IMPALA-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047995#comment-18047995
]
woosuk.ro commented on IMPALA-14551:
------------------------------------
h3. Root Cause Analysis: ANTLR 3.3 vs 3.4+
The hang occurs in {{Lexer.nextToken()}} when HiveLexer encounters Unicode
characters.
h4. ANTLR 3.3
([source|https://github.com/vlkv/antlr3/blob/98b410d62224ce5bdfbe73a919eaa9a299bb7d6b/runtime/Java/src/main/java/org/antlr/runtime/Lexer.java#L102])
{code:java}
catch (RecognitionException re) {
reportError(re);
// match() routine has already called recover()
}
{code}
Problem: For {*}{{RecognitionException}}{*}, only {{reportError()}} is called.
No {{recover()}} → input position unchanged → infinite retry on same character.
h4. ANTLR 3.4+
([source|https://github.com/antlr/antlr3/blob/fb4eb0e43212f4a7e841f4117b88c4b674b87698/runtime/Java/src/main/java/org/antlr/runtime/Lexer.java#L106])
{code:java}
catch (RecognitionException re) {
reportError(re);
recover(re); // throw out current char and try again
}
{code}
{code:java}
public void recover(RecognitionException re) {
input.consume();
}
{code}
Fix: *RecognitionException* now calls {{recover()}} → {{input.consume()}}
advances past problematic character → loop terminates.
h4. Summary
||Version||NoViableAltException handling||Result||
|3.3|{{reportError()}} only|Infinite loop ✗|
|3.4+|{{reportError()}} + {{recover()}}|Loop terminates ✓|
> Query hangs when selecting an expression that contains Unicode Letters
> -----------------------------------------------------------------------
>
> Key: IMPALA-14551
> URL: https://issues.apache.org/jira/browse/IMPALA-14551
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: woosuk.ro
> Assignee: woosuk.ro
> Priority: Major
> Fix For: Impala 4.4.0
>
>
> * Summary
> ** When a complex expression that contains Unicode Letters is selected
> without an alias, an error during alias mapping causes the query to hang
> before the planning phase. CANCEL QUERY is ineffective.
> * Environment
> ** Impala 4.4.0, Hive 3.1.3.
> * Steps to Reproduce
> ** In the SELECT list, use an expression containing Unicode Letters without
> an alias.
> ** Example : `select 누적합 - lag (누적합) over (partition by day order by day)
> from base`
> * Actual Behavior
> ** During alias mapping, invoking HiveLexer triggers repeated retries
> without consuming input, and the query hangs. Threads remain RUNNABLE and
> cannot be canceled
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]