[jira] [Created] (CALCITE-7202) Memory Explosion in Large IN Clauses

Charles Givre (Jira) Fri, 26 Sep 2025 11:41:42 -0700

Charles Givre created CALCITE-7202:
--------------------------------------

             Summary: Memory Explosion in Large IN Clauses
                 Key: CALCITE-7202
                 URL: https://issues.apache.org/jira/browse/CALCITE-7202
             Project: Calcite
          Issue Type: Bug
    Affects Versions: 1.40.0
            Reporter: Charles Givre

We have been trying to update Drill’s Calcite dependency from version 1.34 to
1.40. It has been quite a challenge, but I feel we are very close to a working
solution. We’ve run into an issue which we can’t seem to solve and would like
to request some help from the Calcite community.

Here is a link to the draft PR: [https://github.com/apache/drill/pull/3024].
This may not have all the latest attempts but should be fairly recent. The
specific test which you can use to verify this here:

https://github.com/apache/drill/blob/ee4c0236f8bc7d8f7c7b21f4e5c94939fe62e900/exec/java-exec/src/test/java/org/apache/drill/TestInList.java#L32-L42

The gist of the issue is that when Drill is processing queries with large IN
clauses (60+ items), we observe exponential memory growth leading to
OutOfMemoryError or test timeouts. This occurs specifically during the
SUBQUERY_REWRITE planning phase.

Here is a sample of a failing query:

SELECT employee_id FROM cp.`employee.json`
WHERE employee_id IN (1, 2, 3, ..., 60) -- 60 items

Basically, we’re getting the following errors:
* Memory explosion during rule matching/firing
* OutOfMemoryError in SubQueryRemoveRule.rewriteIn()
* Complete system hang requiring timeout termination

To solve this, we tried a few different approaches including:
* Rule-level fixes: Modified DrillSubQueryRemoveRule.matches() to detect and
skip large IN clauses
* Apply-method handling: Added exception catching and fallback logic in apply()
* Threshold tuning: Tested various IN clause size limits (50, 100, 200 items)
* Memory analysis: Confirmed the issue exists even with minimal rule
configurations

We found that the Memory explosion occurs before apply(). The issue manifests
itself during rule matching/firing phase, not in rule execution. The
exponential growth pattern appears to be related to variant creation or trait
propagation. I will also add that this works fine with ~20 items but fails
consistently with 60+ items. These queries worked with Calcite 1.34 but are
failing as part with the upgrade to Calcite 1.40.

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-7202) Memory Explosion in Large IN Clauses

Reply via email to