asolimando commented on PR #4267: URL: https://github.com/apache/calcite/pull/4267#issuecomment-2769251382
> > @silundong I don't see anything trying to prevent the rule to be applied indefinitely. Now that we have the maximum expansion it might stop at some point but this won't prevent the rule to be tried. I have seen lots of similar problems leading to OOMs in the past, I think it's dangerous to integrate such a rule as-is, and most definitely I wouldn't put it into the standard core rules which get applied by default. > > @asolimando I have checked in `matchFilter` and `matchJoin` whether the expanded newCondition is equal to the original condition. If they are the same, it will be returned directly. For example the rule is matched for the first time and the condition is successfully expanded, a new Filter/Join is generated. When the rule is matched for the second time, the conditions that have been successfully expanded will not change, so it will be returned directly. I don't think it will be applied infinitely. Is there something else I haven't considered? I have in mind the rules like `JoinPushTransitivePredicatesRule` which, in combination with the present rules (and possibly others like `FilterMergeRule` but not only) can keep "inferring" predicates and apply them indefinitely. The problematic pattern could be as follows: 1) the present rule produces new predicates that can be pushed down 2) `JoinPushTransitivePredicatesRule` infers new predicates/filters that are then transposed and merged with existing ones 3) the present rule will match again as the filter is not the same as before (due to the merging), so the planner will apply the rule again, and we can end up in a cycle I have looked up similar issues we had in the past around this subject: Calcite: - CALCITE-2200 - CALCITE-3505 - CALCITE-6432 - CALCITE-6363 Hive: - HIVE-25275 - HIVE-25758 (covered in a past Calcite meetup, [video](https://youtu.be/_phzRNCWJfw?feature=shared&t=1745), [slides](https://www.slideshare.net/StamatisZampetakis/debugging-planning-issues-using-calcites-builtin-loggers), sorry for the self-citation but it goes in-depth on that example so you can easily get an idea) The Hive ones (which uses Calcite) I have had direct experience with, were happening often due to the presence of disjunctive filters and limitations in `RexSimplify` which at some point was unable to simplify expressions futher. Rules like `JoinPushTransitivePredicatesRule` rely on `RexSimplify` to simplify expressions to reach a fixpoint, in cases where you can infer and merge filters, and the resulting expressions can't be simplified to its original (equivalent) version, it keeps getting recognized as a new RelNode where the same rule has never been applied, so the planner will explore it with all matching rules. I am convinced your proposed optimization is interesting and useful, but I'd rather not have it added to core right away and leave it as an optional rule for the reason highlighted above, unless we have some guarantees that the extra predicates won't lead to cases like the one mentioned above. Mihai is also right about the bloat parameter I have suggested, another source of problems is the complexity of the expansion in some cases, in the best case we are able to determine the size of the expanded expression even before starting it, by looking at the "shape" of the input expression, in that case we would have a "cheap" guard that would let us bail out when the transformation would be too costly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
