This is an automated email from the ASF dual-hosted git repository.
jinwoo pushed a commit to branch develop
in repository https://gitbox.apache.org/repos/asf/geode.git
The following commit(s) were added to refs/heads/develop by this push:
new dbdec41174 [GEODE-10463] Fix lexical nondeterminism warning in OQL
grammar between ALL_UNICODE and DIGIT rules (#7928)
dbdec41174 is described below
commit dbdec41174b127d2304fdebba6b70f153e543081
Author: Jinwoo Hwang <[email protected]>
AuthorDate: Mon Sep 29 05:08:00 2025 -0400
[GEODE-10463] Fix lexical nondeterminism warning in OQL grammar between
ALL_UNICODE and DIGIT rules (#7928)
* GEODE-10463: Fix lexical nondeterminism warning in OQL grammar between
ALL_UNICODE and DIGIT rules
Refactored ALL_UNICODE rule to exclude Unicode digit ranges that overlap
with DIGIT rule, eliminating lexical ambiguity in RegionNameCharacter.
The ALL_UNICODE range is now split into 15 non-overlapping segments that
exclude Arabic-Indic, Devanagari, Bengali, and other Unicode digit ranges.
This ensures deterministic tokenization where Unicode digits are always
matched by DIGIT rule while other Unicode characters use ALL_UNICODE.
* GEODE-10463: Add clarifying comment for ALL_UNICODE lexer rule
Add documentation comment to explain that the ALL_UNICODE character
class excludes Unicode digit ranges to prevent lexical nondeterminism
with the DIGIT rule in the OQL grammar lexer.
---
.../org/apache/geode/cache/query/internal/parse/oql.g | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git
a/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
b/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
index ec7142b4b6..cdd1623333 100644
---
a/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
+++
b/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
@@ -133,8 +133,23 @@ DIGIT : ('\u0030'..'\u0039' |
'\u1040'..'\u1049')
;
+// Exclude Unicode digit ranges to prevent lexical nondeterminism with DIGIT
rule
protected
-ALL_UNICODE : ('\u0061'..'\ufffd')
+ALL_UNICODE : ('\u0061'..'\u065f' | // exclude Arabic-Indic digits
+ '\u066a'..'\u06ef' | // exclude Extended Arabic-Indic digits
+ '\u06fa'..'\u0965' | // exclude Devanagari digits
+ '\u0970'..'\u09e5' | // exclude Bengali digits
+ '\u09f0'..'\u0a65' | // exclude Gurmukhi digits
+ '\u0a70'..'\u0ae5' | // exclude Gujarati digits
+ '\u0af0'..'\u0b65' | // exclude Oriya digits
+ '\u0b70'..'\u0be6' | // exclude Tamil digits (note: Tamil
starts at 0be7)
+ '\u0bf0'..'\u0c65' | // exclude Telugu digits
+ '\u0c70'..'\u0ce5' | // exclude Kannada digits
+ '\u0cf0'..'\u0d65' | // exclude Malayalam digits
+ '\u0d70'..'\u0e4f' | // exclude Thai digits
+ '\u0e5a'..'\u0ecf' | // exclude Lao digits
+ '\u0eda'..'\u103f' | // exclude Myanmar digits
+ '\u104a'..'\ufffd') // rest of Unicode
;
/*