Like many performance issues, you are probably running into this because you have set <scaleFactor> to a higher value than anyone has previously done. Quite often there is a straightforward solution to these problems. Create a test that varies on <scaleFactor>, set <scaleFactor> to the smallest value where performance becomes noticeably bad, identify the hot-spot (often something simple like a cartesian loop or hash-key collision), fix the hot-spot, check in the fix and test with a moderately high value of <scaleFactor>.
In your case, <scaleFactor> seems to be the number of CTEs. You should log a jira case with a simple, repetitive query that has N CTEs and noticeably bad performance. > On Sep 1, 2024, at 11:22 PM, JinxTheKid <logansmith...@gmail.com> wrote: > > Hi community, > > I have a unique use case where I have very large queries that have lots of > interconnected CTEs (90KB+, 100+ CTEs). When I run some of these queries > through Calcite, I end up with very long compile times, anywhere from 8s to > 40s. I cannot share the queries, but for context manually optimizing the > query is not really an option at the moment even though it would likely > solve the problems I'm encountering. > > I've narrowed down the culprit of the slow compile times to > *SqlToRelConverter's *method convertQueryRecursive*. *For large > interconnected CTEs, this class ends up executing convertQueryRecursive many > times over the course of converting a query. This scenario sounded like a > great use case for memoization to improve the conversion performance, but I > found that adapting this class was challenging. There is some internal > state of *SqlToRelConverter's *that does not allow me to simply memoize > convertQueryRecursive unfortunately. It appears correlated variables are > (one of) the issue but I'm not certain. Has anyone else in the community > run into similar issues, and if so what did you do to address this? Is this > an area the community has looked into? > > Thanks, > Logan