This is an automated email from the ASF dual-hosted git repository.

jinwoo pushed a commit to branch develop
in repository https://gitbox.apache.org/repos/asf/geode.git


The following commit(s) were added to refs/heads/develop by this push:
     new 05104a977e [GEODE-10508] Remedation of ANTLR nondeterminism warnings 
in OQL grammar (#7942)
05104a977e is described below

commit 05104a977e2b4d84e1737fea75e89e25e1ec78f0
Author: Jinwoo Hwang <[email protected]>
AuthorDate: Wed Dec 3 11:53:21 2025 -0500

    [GEODE-10508] Remedation of ANTLR nondeterminism warnings in OQL grammar 
(#7942)
    
    * GEODE-10508: Fix ANTLR nondeterminism warnings in OQL grammar
    
    This commit resolves four nondeterminism warnings generated by ANTLR during
    the OQL grammar compilation process. These warnings indicated parser 
ambiguity
    that could lead to unpredictable parsing behavior.
    
    Problem Analysis:
    -----------------
    1. Lines 574 & 578 (projection rule):
       The parser could not distinguish between aggregateExpr and expr 
alternatives
       when encountering aggregate function keywords (sum, avg, min, max, 
count).
       These keywords are valid both as:
       - Aggregate function identifiers: sum(field)
       - Regular identifiers in expressions: sum as a field name
    
       Without lookahead, ANTLR could not deterministically choose which 
production
       rule to apply, resulting in nondeterminism warnings.
    
    2. Lines 961 & 979 (aggregateExpr rule):
       Optional 'distinct' keyword created ambiguity in aggregate function 
parsing.
       The parser could not decide whether to:
       - Match the optional 'distinct' keyword, or
       - Skip it and proceed directly to the expression
    
       Both paths were valid, but ANTLR's default behavior doesn't specify
       preference, causing nondeterminism.
    
    Solution Implemented:
    --------------------
    1. Added syntactic predicates to projection rule (lines 574, 578):
       Predicate: (('sum'|'avg'|'min'|'max'|'count') TOK_LPAREN)=>
    
       This instructs the parser to look ahead and check if an aggregate keyword
       is followed by a left parenthesis. If true, it chooses aggregateExpr;
       otherwise, it chooses expr. This resolves the ambiguity by providing
       explicit lookahead logic.
    
    2. Added greedy option to aggregateExpr rule (lines 961, 979):
       Option: options {greedy=true;}
    
       This tells the parser to greedily match the 'distinct' keyword whenever
       it appears, rather than being ambiguous about whether to match or skip.
       The greedy option eliminates the nondeterminism by establishing clear
       matching priority.
    
    3. Updated test to use token constants (AbstractCompiledValueTestJUnitTest):
       Changed: hardcoded value 89 -> OQLLexerTokenTypes.LITERAL_or
    
       Rationale: Adding syntactic predicates changes ANTLR's token numbering
       in the generated lexer (LITERAL_or shifted from 89 to 94). Using the
       constant ensures test correctness regardless of future grammar changes.
       This is a best practice for maintaining test stability.
    
    Impact:
    -------
    - Zero nondeterminism warnings from ANTLR grammar generation
    - No changes to OQL syntax or semantics (fully backward compatible)
    - No runtime behavior changes (modifications only affect parser generation)
    - All existing tests pass with updated token reference
    - Improved parser determinism and maintainability
    
    Technical Details:
    -----------------
    - Syntactic predicates (=>) are standard ANTLR 2 feature for lookahead
    - Greedy option is standard ANTLR feature for optional subrule 
disambiguation
    - Token constant usage follows best practices for generated code references
    - Changes are compile-time only with no runtime performance impact
    
    Files Modified:
    --------------
    - 
geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
    - 
geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java
    
    * GEODE-10508: Apply code formatting to test file
    
    Fix line length formatting for improved readability.
---
 .../org/apache/geode/cache/query/internal/parse/oql.g   | 17 +++++++++++++----
 .../internal/AbstractCompiledValueTestJUnitTest.java    |  9 ++++++++-
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git 
a/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g 
b/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
index cdd1623333..5ae8b4e4a7 100644
--- 
a/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
+++ 
b/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
@@ -571,11 +571,16 @@ projectionAttributes :
 
 projection!{ AST node  = null;}:
         
-            lb1:identifier TOK_COLON!  ( tok1:aggregateExpr{node = #tok1;} | 
tok2:expr{node = #tok2;})
+            // Use syntactic predicate to resolve nondeterminism between 
aggregateExpr and expr.
+            // The predicate checks for aggregate function keywords (sum, avg, 
min, max, count) followed by '('.
+            // Without this, the parser cannot determine which alternative to 
choose when it sees these keywords,
+            // since they can also be used as identifiers in regular 
expressions.
+            lb1:identifier TOK_COLON!  ( (("sum"|"avg"|"min"|"max"|"count") 
TOK_LPAREN)=> tok1:aggregateExpr{node = #tok1;} | tok2:expr{node = #tok2;})
             { #projection = #([PROJECTION, "projection",
             "org.apache.geode.cache.query.internal.parse.ASTProjection"],  
node, #lb1); } 
         |
-            (tok3:aggregateExpr{node = #tok3;} | tok4:expr{node = #tok4;})
+            // Same syntactic predicate as above to handle projections without 
a label (identifier:)
+            ((("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=> 
tok3:aggregateExpr{node = #tok3;} | tok4:expr{node = #tok4;})
             (
                 "as"
                 lb2: identifier
@@ -958,7 +963,10 @@ collectionExpr :
 aggregateExpr  { int aggFunc = -1; boolean distinctOnly = false; }:
 
              !("sum" {aggFunc = SUM;} | "avg" {aggFunc = AVG;} )
-              TOK_LPAREN ("distinct"! {distinctOnly = true;} ) ? tokExpr1:expr 
TOK_RPAREN 
+              // Use greedy option to resolve nondeterminism with optional 
'distinct' keyword.
+              // Greedy tells the parser to match 'distinct' whenever it 
appears, rather than
+              // being ambiguous about whether to match it or skip directly to 
the expression.
+              TOK_LPAREN (options {greedy=true;}: "distinct"! {distinctOnly = 
true;} ) ? tokExpr1:expr TOK_RPAREN 
               { #aggregateExpr = #([AGG_FUNC, "aggregate", 
"org.apache.geode.cache.query.internal.parse.ASTAggregateFunc"],
               #tokExpr1); 
                 
((ASTAggregateFunc)#aggregateExpr).setAggregateFunctionType(aggFunc);
@@ -975,8 +983,9 @@ aggregateExpr  { int aggFunc = -1; boolean distinctOnly = 
false; }:
              
              |
               
"count"^<AST=org.apache.geode.cache.query.internal.parse.ASTAggregateFunc>
+              // Same greedy option as above for count's optional 'distinct' 
keyword
               TOK_LPAREN!  ( TOK_STAR 
<AST=org.apache.geode.cache.query.internal.parse.ASTDummy>
-              | ("distinct"! {distinctOnly = true;} ) ? expr ) TOK_RPAREN! 
+              | (options {greedy=true;}: "distinct"! {distinctOnly = true;} ) 
? expr ) TOK_RPAREN! 
               {  
                  
((ASTAggregateFunc)#aggregateExpr).setAggregateFunctionType(COUNT);
                  #aggregateExpr.setText("aggregate");
diff --git 
a/geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java
 
b/geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java
index 8f1e4f4d28..b996f872b1 100644
--- 
a/geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java
+++ 
b/geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java
@@ -24,6 +24,7 @@ import junitparams.Parameters;
 import org.junit.Test;
 import org.junit.runner.RunWith;
 
+import org.apache.geode.cache.query.internal.parse.OQLLexerTokenTypes;
 import org.apache.geode.cache.query.internal.types.CollectionTypeImpl;
 import org.apache.geode.test.junit.runners.GeodeParamsRunner;
 
@@ -47,7 +48,13 @@ public class AbstractCompiledValueTestJUnitTest {
             new LinkedHashMap<>()),
         new CompiledIn(compiledValue1, compiledValue2),
         new CompiledIteratorDef("test", new CollectionTypeImpl(), 
compiledValue1),
-        new CompiledJunction(new CompiledValue[] {compiledValue1, 
compiledValue2}, 89),
+        // Changed from hardcoded value 89 to OQLLexerTokenTypes.LITERAL_or 
constant.
+        // The hardcoded value 89 was the token number for LITERAL_or in the 
original grammar,
+        // but after adding syntactic predicates to fix nondeterminism 
warnings, the token
+        // numbering changed (LITERAL_or is now 94). Using the constant 
ensures this test
+        // remains correct regardless of future grammar changes.
+        new CompiledJunction(new CompiledValue[] {compiledValue1, 
compiledValue2},
+            OQLLexerTokenTypes.LITERAL_or),
         new CompiledLike(compiledValue1, compiledValue2),
         new CompiledLiteral(compiledValue1),
         new CompiledMod(compiledValue1, compiledValue2),

Reply via email to