Done: https://issues.apache.org/jira/browse/CALCITE-7202
> On Sep 26, 2025, at 14:18, Mihai Budiu <[email protected]> wrote: > > This looks like a bug, you should consider filing an issue, especially if you > can provide a reproduction. We can consider this as a release blocker and > maybe the community can devise a fix before 1.41 - which is planned very soon. > > Mihai > > ________________________________ > From: Charles Givre <[email protected]> > Sent: Friday, September 26, 2025 8:59 AM > To: [email protected] <[email protected]> > Subject: Request for Assistance > > Dear Calcite Community, > > I have a few questions for the community. We have been trying to update > Drill’s Calcite dependency from version 1.34 to 1.40. It has been quite a > challenge, but I feel we are very close to a working solution. We’ve run > into an issue which we can’t seem to solve and would like to request some > help from the Calcite community. > > Here is a link to the draft PR: https://github.com/apache/drill/pull/3024. > This may not have all the latest attempts but should be fairly recent. > > The gist of the issue is that when Drill is processing queries with large IN > clauses (60+ items), we observe exponential memory growth leading to > OutOfMemoryError or test timeouts. This occurs specifically during the > SUBQUERY_REWRITE planning phase. > > Here is a sample of a failing query: > > SELECT employee_id FROM cp.`employee.json` > WHERE employee_id IN (1, 2, 3, ..., 60) -- 60 items > > Basically, we’re getting the following errors: > Memory explosion during rule matching/firing > OutOfMemoryError in SubQueryRemoveRule.rewriteIn() > Complete system hang requiring timeout termination > > To solve this, we tried a few different approaches including: > Rule-level fixes: Modified DrillSubQueryRemoveRule.matches() to detect and > skip large IN clauses > Apply-method handling: Added exception catching and fallback logic in apply() > Threshold tuning: Tested various IN clause size limits (50, 100, 200 items) > Memory analysis: Confirmed the issue exists even with minimal rule > configurations > > We found that the Memory explosion occurs before apply(). The issue manifests > itself during rule matching/firing phase, not in rule execution. The > exponential growth pattern appears to be related to variant creation or trait > propagation. I will also add that this works fine with ~20 items but fails > consistently with 60+ items. These queries worked with Calcite 1.34 but are > failing as part with the upgrade to Calcite 1.40. > > My questions: > > 1. Is this a known issue? Are there known changes in Calcite 1.40's > SubQueryRemoveRule that could cause this behavior? > 2. Configuration options? Are there planner settings or configuration > options that could control the memory usage during subquery rewriting? > 3. Alternative approaches? What's the recommended way to handle large IN > clauses in Calcite 1.40 while avoiding memory explosion? > 4. Performance tuning? Are there specific traits or rule ordering strategies > that could mitigate this issue? > 5. Do you have any suggestions or advice which could help us resolve this > issue and complete the upgrade? > > > Technical Details: > Environment: Apache Drill 1.23.0-SNAPSHOT > Java: OpenJDK 11 > Test case: TestInList.testLargeInList1 with 60-item IN clause > Memory: OutOfMemoryError with default heap settings > > Thank you very much for your assistance. > Best, > — C
signature.asc
Description: Message signed with OpenPGP
