JinwooHwang opened a new pull request, #7942:
URL: https://github.com/apache/geode/pull/7942
### Summary
This PR resolves four nondeterminism warnings generated by ANTLR during the
OQL (Object Query Language) grammar compilation process. These warnings
indicated parser ambiguity that could lead to unpredictable parsing behavior.
### Issue
Fixes GEODE-10508
### Problem Description
During the `generateGrammarSource` task, ANTLR produced the following
warnings:
```
> Task :geode-core:generateGrammarSource
/home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:574:40:
warning:nondeterminism between alts 1 and 2 of block upon
k==1:"sum","avg","min","max","count"
k==2:TOK_LPAREN
/home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:578:13:
warning:nondeterminism between alts 1 and 2 of block upon
k==1:"sum","avg","min","max","count"
k==2:TOK_LPAREN
/home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:961:26:
warning:nondeterminism between alts 1 and 2 of block upon
k==1:"distinct"
k==2:TOK_LPAREN,TOK_PLUS,TOK_MINUS,TOK_DOLLAR,...
/home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:979:17:
warning:nondeterminism between alts 1 and 2 of block upon
k==1:"distinct"
k==2:TOK_LPAREN,TOK_PLUS,TOK_MINUS,TOK_DOLLAR,...
```
#### Root Cause
**Lines 574 & 578 (projection rule):**
- The parser could not distinguish between `aggregateExpr` and `expr`
alternatives when encountering aggregate function keywords (`sum`, `avg`,
`min`, `max`, `count`)
- These keywords are valid both as:
- Aggregate function identifiers: `sum(field)`
- Regular identifiers in expressions: `sum` as a field name
- Without lookahead, ANTLR could not deterministically choose which
production rule to apply
**Lines 961 & 979 (aggregateExpr rule):**
- The optional `distinct` keyword created ambiguity in aggregate function
parsing
- The parser could not decide whether to match the optional `distinct`
keyword or skip it and proceed directly to the expression
- Both paths were valid, but ANTLR's default behavior doesn't specify
preference
### Solution
#### 1. Added Syntactic Predicates (Lines 574 & 578)
Added lookahead predicates to the `projection` rule:
```antlr
// Before:
( tok1:aggregateExpr{node = #tok1;} | tok2:expr{node = #tok2;})
// After:
( (("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=> tok1:aggregateExpr{node =
#tok1;} | tok2:expr{node = #tok2;})
```
**Reasoning:**
The predicate `(("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=>` instructs
the parser to look ahead and check if an aggregate keyword is followed by a
left parenthesis. If true, it chooses `aggregateExpr`; otherwise, it chooses
`expr`. This provides explicit lookahead logic to resolve the ambiguity.
#### 2. Added Greedy Option (Lines 961 & 979)
Added `greedy` option for optional `distinct` keywords:
```antlr
// Before:
TOK_LPAREN ("distinct"! {distinctOnly = true;} ) ? tokExpr1:expr TOK_RPAREN
// After:
TOK_LPAREN (options {greedy=true;}: "distinct"! {distinctOnly = true;} ) ?
tokExpr1:expr TOK_RPAREN
```
**Reasoning:**
The `greedy` option tells the parser to greedily match the `distinct`
keyword whenever it appears, rather than being ambiguous about whether to match
or skip. This establishes clear matching priority and eliminates nondeterminism.
#### 3. Updated Test to Use Token Constants
Modified `AbstractCompiledValueTestJUnitTest.java`:
```java
// Before:
new CompiledJunction(new CompiledValue[] {compiledValue1, compiledValue2},
89)
// After:
import org.apache.geode.cache.query.internal.parse.OQLLexerTokenTypes;
...
new CompiledJunction(new CompiledValue[] {compiledValue1, compiledValue2},
OQLLexerTokenTypes.LITERAL_or)
```
**Reasoning:**
Adding syntactic predicates changes ANTLR's token numbering in the generated
lexer (`LITERAL_or` shifted from 89 to 94). Using the constant ensures test
correctness regardless of future grammar changes. This is a best practice for
maintaining test stability.
### Changes Made
-
`geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g`
- Added syntactic predicates to `projection` rule (2 locations)
- Added `greedy` option to `aggregateExpr` rule (2 locations)
- Added inline comments explaining the reasoning
-
`geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java`
- Added import for `OQLLexerTokenTypes`
- Replaced hardcoded token value with constant
- Added comments explaining the change
### Testing
#### Verification Steps
- [x] Grammar generation completes without warnings
- [x] All query parser tests pass
- [x] `AbstractCompiledValueTestJUnitTest` passes
- [x] No regressions in query functionality
#### Test Commands
```bash
./gradlew :geode-core:generateGrammarSource
./gradlew :geode-core:test --tests
"org.apache.geode.cache.query.internal.parse.*"
./gradlew :geode-core:test --tests
"org.apache.geode.cache.query.internal.Abstract*"
```
### Impact Assessment
| Category | Impact |
|----------|--------|
| **Risk Level** | Low - Changes only affect parser generation, not runtime
behavior |
| **Performance** | None - Modifications are compile-time only |
| **Compatibility** | Fully maintained - No changes to OQL syntax or
semantics |
| **Backward Compatibility** | Yes - All existing queries work unchanged |
### Benefits
- ✅ Zero nondeterminism warnings from ANTLR grammar generation
- ✅ Improved parser determinism and predictability
- ✅ Better maintainability with documented reasoning
- ✅ More robust test suite using constants instead of magic numbers
- ✅ No breaking changes or behavior modifications
### Technical Details
- Syntactic predicates (`=>`) are a standard ANTLR 2 feature for lookahead
- Greedy option is a standard ANTLR feature for optional subrule
disambiguation
- Token constant usage follows best practices for generated code references
- Changes are compile-time only with no runtime performance impact
### Checklist
- [x] Changes are properly documented with inline comments
- [x] All tests pass
- [x] No nondeterminism warnings
- [x] Backward compatibility maintained
- [x] Commit message follows project guidelines
- [x] Code follows existing style and conventions
<!-- Thank you for submitting a contribution to Apache Geode. -->
<!-- In order to streamline review of your contribution we ask that you
ensure you've taken the following steps. -->
### For all changes, please confirm:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced in
the commit message?
- [x] Has your PR been rebased against the latest commit within the target
branch (typically `develop`)?
- [x] Is your initial contribution a single, squashed commit?
- [x] Does `gradlew build` run cleanly?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]