terrymanu commented on issue #29672:
URL: 
https://github.com/apache/shardingsphere/issues/29672#issuecomment-3531786468

   Root Cause Analysis
   
     After comprehensive analysis of the ShardingSphere codebase, I have 
identified the root cause of this performance issue. This is a grammar design 
problem in the PostgreSQL ANTLR grammar file, not a visitor implementation
     issue.
   
     Key Differences Between MySQL and PostgreSQL
   
     MySQL Grammar (Efficient)
   
   ```
     predicate
         : bitExpr NOT? IN subquery
         | bitExpr NOT? IN LP_ expr (COMMA_ expr)* RP_
         // ...
   ```
     PostgreSQL Grammar (Problematic)
   
   ```
     aExpr IN inExpr
     aExpr NOT IN inExpr
   
     inExpr
         : selectWithParens | LP_ exprList RP_
   
     exprList
         : aExpr
         | exprList COMMA_ aExpr  // Left recursion issue!
   ```
   
     Problem Explanation
   
     MySQL uses (COMMA_ expr)* which generates a flat list structure that ANTLR 
can access via ctx.expr() in O(n) time.
   
     PostgreSQL uses left recursion in exprList which creates a deeply nested 
binary tree structure requiring recursive processing with O(n²) complexity.
   
     For 15,000 IN clause elements:
     - MySQL: 15,000 simple iterations - linear performance
     - PostgreSQL: 15,000-level recursion depth - exponential performance 
degradation
   
     Fix Suggestions
   
     Option 1: Modify PostgreSQL Grammar (Recommended)
   
     Change the PostgreSQL grammar to use a flatter structure similar to MySQL:
   
   ```
     inExpr
         : selectWithParens
         | LP_ aExpr (COMMA_ aExpr)* RP_
   ```
   
     This would eliminate the recursive exprList rule entirely for IN 
expressions.
   
     Option 2: Optimize Visitor Implementation
   
     If grammar changes are not feasible, optimize the visitor to use iteration 
instead of recursion:
   
   ```java
     public ASTNode visitExprList(final ExprListContext ctx) {
         CollectionValue<ExpressionSegment> result = new CollectionValue<>();
         ExprListContext current = ctx;
         while (null != current) {
             result.getValue().add((ExpressionSegment) visit(current.aExpr()));
             current = current.exprList();
         }
         Collections.reverse(result.getValue()); // Fix order
         return result;
     }
   ```
   
     Option 3: Add Configuration Limits
   
     Add reasonable limits for IN clause sizes to prevent performance issues:
   
   ```java
     // Add configuration property
     private static final int MAX_IN_CLAUSE_SIZE = 1000;
   
     if (result.getValue().size() > MAX_IN_CLAUSE_SIZE) {
         throw new SQLParsingException("IN clause exceeds maximum size of " + 
MAX_IN_CLAUSE_SIZE);
     }
   ```
   
     Call for Contribution
   
     We warmly invite you to submit a Pull Request to implement one of these 
fixes. Here's what would be needed:
   
     1. Implement the chosen fix (Option 1 is strongly recommended)
     2. Add comprehensive tests for large IN clauses (10,000+ elements)
     3. Verify performance improvements with benchmarks
     4. Ensure backwards compatibility with existing functionality
   
     The ShardingSphere community would greatly appreciate your contribution to 
fixing this performance issue that affects PostgreSQL users with large IN 
clauses.
   
     Would you be interested in submitting a PR to implement one of these 
solutions? We'd be happy to guide you through the contribution process and help 
review the implementation.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to