terrymanu commented on issue #29672:
URL:
https://github.com/apache/shardingsphere/issues/29672#issuecomment-3531786468
Root Cause Analysis
After comprehensive analysis of the ShardingSphere codebase, I have
identified the root cause of this performance issue. This is a grammar design
problem in the PostgreSQL ANTLR grammar file, not a visitor implementation
issue.
Key Differences Between MySQL and PostgreSQL
MySQL Grammar (Efficient)
```
predicate
: bitExpr NOT? IN subquery
| bitExpr NOT? IN LP_ expr (COMMA_ expr)* RP_
// ...
```
PostgreSQL Grammar (Problematic)
```
aExpr IN inExpr
aExpr NOT IN inExpr
inExpr
: selectWithParens | LP_ exprList RP_
exprList
: aExpr
| exprList COMMA_ aExpr // Left recursion issue!
```
Problem Explanation
MySQL uses (COMMA_ expr)* which generates a flat list structure that ANTLR
can access via ctx.expr() in O(n) time.
PostgreSQL uses left recursion in exprList which creates a deeply nested
binary tree structure requiring recursive processing with O(n²) complexity.
For 15,000 IN clause elements:
- MySQL: 15,000 simple iterations - linear performance
- PostgreSQL: 15,000-level recursion depth - exponential performance
degradation
Fix Suggestions
Option 1: Modify PostgreSQL Grammar (Recommended)
Change the PostgreSQL grammar to use a flatter structure similar to MySQL:
```
inExpr
: selectWithParens
| LP_ aExpr (COMMA_ aExpr)* RP_
```
This would eliminate the recursive exprList rule entirely for IN
expressions.
Option 2: Optimize Visitor Implementation
If grammar changes are not feasible, optimize the visitor to use iteration
instead of recursion:
```java
public ASTNode visitExprList(final ExprListContext ctx) {
CollectionValue<ExpressionSegment> result = new CollectionValue<>();
ExprListContext current = ctx;
while (null != current) {
result.getValue().add((ExpressionSegment) visit(current.aExpr()));
current = current.exprList();
}
Collections.reverse(result.getValue()); // Fix order
return result;
}
```
Option 3: Add Configuration Limits
Add reasonable limits for IN clause sizes to prevent performance issues:
```java
// Add configuration property
private static final int MAX_IN_CLAUSE_SIZE = 1000;
if (result.getValue().size() > MAX_IN_CLAUSE_SIZE) {
throw new SQLParsingException("IN clause exceeds maximum size of " +
MAX_IN_CLAUSE_SIZE);
}
```
Call for Contribution
We warmly invite you to submit a Pull Request to implement one of these
fixes. Here's what would be needed:
1. Implement the chosen fix (Option 1 is strongly recommended)
2. Add comprehensive tests for large IN clauses (10,000+ elements)
3. Verify performance improvements with benchmarks
4. Ensure backwards compatibility with existing functionality
The ShardingSphere community would greatly appreciate your contribution to
fixing this performance issue that affects PostgreSQL users with large IN
clauses.
Would you be interested in submitting a PR to implement one of these
solutions? We'd be happy to guide you through the contribution process and help
review the implementation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]