JinwooHwang opened a new pull request, #7928:
URL: https://github.com/apache/geode/pull/7928

   ## Fix lexical nondeterminism warning in OQL grammar between `ALL_UNICODE` 
and `DIGIT` rules
   
   ### Problem
   
   The ANTLR grammar generation for the OQL (Object Query Language) parser was 
producing a **lexical nondeterminism warning** during builds:
   
   > This warning occurred in the `RegionNameCharacter` lexer rule due to 
overlapping character ranges between the `ALL_UNICODE` and `DIGIT` rules.
   
   The `ALL_UNICODE` rule was defined as a broad range (`'\u0061'..'\ufffd'`) 
that included all the Unicode digit ranges explicitly defined in the `DIGIT` 
rule, creating lexical ambiguity.
   
   ### Root Cause
   When the lexer encountered Unicode digits (e.g., Arabic-Indic digits `٠-٩`, 
Devanagari digits `०-९`, etc.) in region names, it couldn't deterministically 
choose between:
   
   - Matching them as part of `ALL_UNICODE`
   - Matching them as `DIGIT` characters
   
   This created nondeterminism between alternatives 1 (`ALL_UNICODE`) and 3 
(`DIGIT`) in the `RegionNameCharacter` rule.
   
   ### Solution
   
   Refactored the `ALL_UNICODE` rule to **explicitly exclude all Unicode digit 
ranges** defined in the `DIGIT` rule, eliminating character range overlap.
   
   This ensures:
   
   - Unicode digits are only matched by the `DIGIT` rule
   - `ALL_UNICODE` covers all other Unicode characters without overlap
   - The lexer can deterministically choose the appropriate token type
   
   ### Impact
   
   **Before:**
   
   - ⚠️ Build generates lexical nondeterminism warnings
   - ⚠️ Potential for inconsistent tokenization of Unicode digits in region 
names
   
   **After:**
   
   - ✅ Clean build without lexical warnings
   - ✅ Deterministic tokenization of Unicode characters
   - ✅ No functional impact on OQL query parsing
   - ✅ Maintains full backward compatibility
   
   ### Testing
   
   - Verified that `:geode-core:generateGrammarSource` completes without the 
lexical nondeterminism warning
   - No impact on existing OQL functionality as this only affects internal 
lexer disambiguation
   - Unicode digit handling in region names is now consistent and predictable
   
   ### Files Changed
   
   - `oql.g`
   
   > **Note:** This change only affects the internal lexer behavior and has no 
impact on OQL query syntax or semantics. All existing queries will continue to 
work exactly as before.
   
   
   <!-- Thank you for submitting a contribution to Apache Geode. -->
   
   <!-- In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken: 
   -->
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [x] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   - [x] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   <!-- Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to