Charles Givre created CALCITE-7202:
--------------------------------------

             Summary: Memory Explosion in Large IN Clauses
                 Key: CALCITE-7202
                 URL: https://issues.apache.org/jira/browse/CALCITE-7202
             Project: Calcite
          Issue Type: Bug
    Affects Versions: 1.40.0
            Reporter: Charles Givre


We have been trying to update Drill’s Calcite dependency from version 1.34 to 
1.40.  It has been quite a challenge, but I feel we are very close to a working 
solution.  We’ve run into an issue which we can’t seem to solve and would like 
to request some help from the Calcite community. 

Here is a link to the draft PR:  [https://github.com/apache/drill/pull/3024].  
This may not have all the latest attempts but should be fairly recent.  The 
specific test which you can use to verify this here:

https://github.com/apache/drill/blob/ee4c0236f8bc7d8f7c7b21f4e5c94939fe62e900/exec/java-exec/src/test/java/org/apache/drill/TestInList.java#L32-L42


The gist of the issue is that when Drill is processing queries with large IN 
clauses (60+ items), we observe exponential memory growth leading to 
OutOfMemoryError or test timeouts. This occurs specifically during the 
SUBQUERY_REWRITE planning phase.

Here is a sample of a failing query:

SELECT employee_id FROM cp.`employee.json`
WHERE employee_id IN (1, 2, 3, ..., 60)  -- 60 items

Basically, we’re getting the following errors:
* Memory explosion during rule matching/firing
* OutOfMemoryError in SubQueryRemoveRule.rewriteIn()
* Complete system hang requiring timeout termination

To solve this, we tried a few different approaches including:
* Rule-level fixes: Modified DrillSubQueryRemoveRule.matches() to detect and 
skip large IN clauses
* Apply-method handling: Added exception catching and fallback logic in apply()
* Threshold tuning: Tested various IN clause size limits (50, 100, 200 items)
* Memory analysis: Confirmed the issue exists even with minimal rule 
configurations

We found that the Memory explosion occurs before apply(). The issue manifests 
itself during rule matching/firing phase, not in rule execution.   The 
exponential growth pattern appears to be related to variant creation or trait 
propagation.  I will also add that this works fine with ~20 items but fails 
consistently with 60+ items.  These queries worked with Calcite 1.34 but are 
failing as part with the upgrade to Calcite 1.40.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to