Like many performance issues, you are probably running into this because you 
have set <scaleFactor> to a higher value than anyone has previously done. Quite 
often there is a straightforward solution to these problems. Create a test that 
varies on <scaleFactor>, set <scaleFactor> to the smallest value where 
performance becomes noticeably bad, identify the hot-spot (often something 
simple like a cartesian loop or hash-key collision), fix the hot-spot, check in 
the fix and test with a moderately high value of <scaleFactor>.

In your case, <scaleFactor> seems to be the number of CTEs. You should log a 
jira case with a simple, repetitive query that has N CTEs and noticeably bad 
performance.

> On Sep 1, 2024, at 11:22 PM, JinxTheKid <logansmith...@gmail.com> wrote:
> 
> Hi community,
> 
> I have a unique use case where I have very large queries that have lots of
> interconnected CTEs (90KB+, 100+ CTEs). When I run some of these queries
> through Calcite, I end up with very long compile times, anywhere from 8s to
> 40s. I cannot share the queries, but for context manually optimizing the
> query is not really an option at the moment even though it would likely
> solve the problems I'm encountering.
> 
> I've narrowed down the culprit of the slow compile times to
> *SqlToRelConverter's *method convertQueryRecursive*. *For large
> interconnected CTEs, this class ends up executing convertQueryRecursive many
> times over the course of converting a query. This scenario sounded like a
> great use case for memoization to improve the conversion performance, but I
> found that adapting this class was challenging. There is some internal
> state of  *SqlToRelConverter's *that does not allow me to simply memoize
> convertQueryRecursive unfortunately. It appears correlated variables are
> (one of) the issue but I'm not certain. Has anyone else in the community
> run into similar issues, and if so what did you do to address this? Is this
> an area the community has looked into?
> 
> Thanks,
> Logan

Reply via email to