Done: https://issues.apache.org/jira/browse/CALCITE-7202


> On Sep 26, 2025, at 14:18, Mihai Budiu <[email protected]> wrote:
> 
> This looks like a bug, you should consider filing an issue, especially if you 
> can provide a reproduction. We can consider this as a release blocker and 
> maybe the community can devise a fix before 1.41 - which is planned very soon.
> 
> Mihai
> 
> ________________________________
> From: Charles Givre <[email protected]>
> Sent: Friday, September 26, 2025 8:59 AM
> To: [email protected] <[email protected]>
> Subject: Request for Assistance
> 
> Dear Calcite Community,
> 
> I have a few questions for the community.   We have been trying to update 
> Drill’s Calcite dependency from version 1.34 to 1.40.  It has been quite a 
> challenge, but I feel we are very close to a working solution.  We’ve run 
> into an issue which we can’t seem to solve and would like to request some 
> help from the Calcite community.
> 
> Here is a link to the draft PR:  https://github.com/apache/drill/pull/3024.  
> This may not have all the latest attempts but should be fairly recent.
> 
> The gist of the issue is that when Drill is processing queries with large IN 
> clauses (60+ items), we observe exponential memory growth leading to 
> OutOfMemoryError or test timeouts. This occurs specifically during the 
> SUBQUERY_REWRITE planning phase.
> 
> Here is a sample of a failing query:
> 
> SELECT employee_id FROM cp.`employee.json`
> WHERE employee_id IN (1, 2, 3, ..., 60)  -- 60 items
> 
> Basically, we’re getting the following errors:
> Memory explosion during rule matching/firing
> OutOfMemoryError in SubQueryRemoveRule.rewriteIn()
> Complete system hang requiring timeout termination
> 
> To solve this, we tried a few different approaches including:
> Rule-level fixes: Modified DrillSubQueryRemoveRule.matches() to detect and 
> skip large IN clauses
> Apply-method handling: Added exception catching and fallback logic in apply()
> Threshold tuning: Tested various IN clause size limits (50, 100, 200 items)
> Memory analysis: Confirmed the issue exists even with minimal rule 
> configurations
> 
> We found that the Memory explosion occurs before apply(). The issue manifests 
> itself during rule matching/firing phase, not in rule execution.   The 
> exponential growth pattern appears to be related to variant creation or trait 
> propagation.  I will also add that this works fine with ~20 items but fails 
> consistently with 60+ items.  These queries worked with Calcite 1.34 but are 
> failing as part with the upgrade to Calcite 1.40.
> 
> My questions:
> 
> 1.  Is this a known issue? Are there known changes in Calcite 1.40's 
> SubQueryRemoveRule that could cause this behavior?
> 2.  Configuration options? Are there planner settings or configuration 
> options that could control the memory usage during subquery rewriting?
> 3.  Alternative approaches? What's the recommended way to handle large IN 
> clauses in Calcite 1.40 while avoiding memory explosion?
> 4.  Performance tuning? Are there specific traits or rule ordering strategies 
> that could mitigate this issue?
> 5.  Do you have any suggestions or advice which could help us resolve this 
> issue and complete the upgrade?
> 
> 
> Technical Details:
> Environment: Apache Drill 1.23.0-SNAPSHOT
> Java: OpenJDK 11
> Test case: TestInList.testLargeInList1 with 60-item IN clause
> Memory: OutOfMemoryError with default heap settings
> 
> Thank you very much for your assistance.
> Best,
> — C

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to