Charles Givre created CALCITE-7202:
--------------------------------------
Summary: Memory Explosion in Large IN Clauses
Key: CALCITE-7202
URL: https://issues.apache.org/jira/browse/CALCITE-7202
Project: Calcite
Issue Type: Bug
Affects Versions: 1.40.0
Reporter: Charles Givre
We have been trying to update Drill’s Calcite dependency from version 1.34 to
1.40. It has been quite a challenge, but I feel we are very close to a working
solution. We’ve run into an issue which we can’t seem to solve and would like
to request some help from the Calcite community.
Here is a link to the draft PR: [https://github.com/apache/drill/pull/3024].
This may not have all the latest attempts but should be fairly recent. The
specific test which you can use to verify this here:
https://github.com/apache/drill/blob/ee4c0236f8bc7d8f7c7b21f4e5c94939fe62e900/exec/java-exec/src/test/java/org/apache/drill/TestInList.java#L32-L42
The gist of the issue is that when Drill is processing queries with large IN
clauses (60+ items), we observe exponential memory growth leading to
OutOfMemoryError or test timeouts. This occurs specifically during the
SUBQUERY_REWRITE planning phase.
Here is a sample of a failing query:
SELECT employee_id FROM cp.`employee.json`
WHERE employee_id IN (1, 2, 3, ..., 60) -- 60 items
Basically, we’re getting the following errors:
* Memory explosion during rule matching/firing
* OutOfMemoryError in SubQueryRemoveRule.rewriteIn()
* Complete system hang requiring timeout termination
To solve this, we tried a few different approaches including:
* Rule-level fixes: Modified DrillSubQueryRemoveRule.matches() to detect and
skip large IN clauses
* Apply-method handling: Added exception catching and fallback logic in apply()
* Threshold tuning: Tested various IN clause size limits (50, 100, 200 items)
* Memory analysis: Confirmed the issue exists even with minimal rule
configurations
We found that the Memory explosion occurs before apply(). The issue manifests
itself during rule matching/firing phase, not in rule execution. The
exponential growth pattern appears to be related to variant creation or trait
propagation. I will also add that this works fine with ~20 items but fails
consistently with 60+ items. These queries worked with Calcite 1.34 but are
failing as part with the upgrade to Calcite 1.40.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)