(geode) branch develop updated: [GEODE-10463] Fix lexical nondeterminism warning in OQL grammar between ALL_UNICODE and DIGIT rules (#7928)

jinwoo Fri, 17 Oct 2025 19:19:41 -0700

This is an automated email from the ASF dual-hosted git repository.

jinwoo pushed a commit to branch develop
in repository https://gitbox.apache.org/repos/asf/geode.git



The following commit(s) were added to refs/heads/develop by this push:
     new dbdec41174 [GEODE-10463] Fix lexical nondeterminism warning in OQL 
grammar between ALL_UNICODE and DIGIT rules (#7928)
dbdec41174 is described below

commit dbdec41174b127d2304fdebba6b70f153e543081
Author: Jinwoo Hwang <[email protected]>
AuthorDate: Mon Sep 29 05:08:00 2025 -0400

    [GEODE-10463] Fix lexical nondeterminism warning in OQL grammar between 
ALL_UNICODE and DIGIT rules (#7928)
    
    * GEODE-10463: Fix lexical nondeterminism warning in OQL grammar between 
ALL_UNICODE and DIGIT rules
    
    Refactored ALL_UNICODE rule to exclude Unicode digit ranges that overlap
    with DIGIT rule, eliminating lexical ambiguity in RegionNameCharacter.
    The ALL_UNICODE range is now split into 15 non-overlapping segments that
    exclude Arabic-Indic, Devanagari, Bengali, and other Unicode digit ranges.
    
    This ensures deterministic tokenization where Unicode digits are always
    matched by DIGIT rule while other Unicode characters use ALL_UNICODE.
    
    * GEODE-10463: Add clarifying comment for ALL_UNICODE lexer rule
    
    Add documentation comment to explain that the ALL_UNICODE character
    class excludes Unicode digit ranges to prevent lexical nondeterminism
    with the DIGIT rule in the OQL grammar lexer.
---
 .../org/apache/geode/cache/query/internal/parse/oql.g   | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git 
a/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g 
b/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
index ec7142b4b6..cdd1623333 100644
--- 
a/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
+++ 
b/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
@@ -133,8 +133,23 @@ DIGIT : ('\u0030'..'\u0039' |
        '\u1040'..'\u1049')
     ;
 
+// Exclude Unicode digit ranges to prevent lexical nondeterminism with DIGIT 
rule
 protected
-ALL_UNICODE : ('\u0061'..'\ufffd')     
+ALL_UNICODE : ('\u0061'..'\u065f' |   // exclude Arabic-Indic digits
+               '\u066a'..'\u06ef' |   // exclude Extended Arabic-Indic digits  
+               '\u06fa'..'\u0965' |   // exclude Devanagari digits
+               '\u0970'..'\u09e5' |   // exclude Bengali digits
+               '\u09f0'..'\u0a65' |   // exclude Gurmukhi digits
+               '\u0a70'..'\u0ae5' |   // exclude Gujarati digits
+               '\u0af0'..'\u0b65' |   // exclude Oriya digits
+               '\u0b70'..'\u0be6' |   // exclude Tamil digits (note: Tamil 
starts at 0be7)
+               '\u0bf0'..'\u0c65' |   // exclude Telugu digits
+               '\u0c70'..'\u0ce5' |   // exclude Kannada digits
+               '\u0cf0'..'\u0d65' |   // exclude Malayalam digits
+               '\u0d70'..'\u0e4f' |   // exclude Thai digits
+               '\u0e5a'..'\u0ecf' |   // exclude Lao digits
+               '\u0eda'..'\u103f' |   // exclude Myanmar digits
+               '\u104a'..'\ufffd')    // rest of Unicode
     ;
 
 /*

(geode) branch develop updated: [GEODE-10463] Fix lexical nondeterminism warning in OQL grammar between ALL_UNICODE and DIGIT rules (#7928)

Reply via email to