asolimando commented on PR #4267:
URL: https://github.com/apache/calcite/pull/4267#issuecomment-2769251382

   > > @silundong I don't see anything trying to prevent the rule to be applied 
indefinitely. Now that we have the maximum expansion it might stop at some 
point but this won't prevent the rule to be tried. I have seen lots of similar 
problems leading to OOMs in the past, I think it's dangerous to integrate such 
a rule as-is, and most definitely I wouldn't put it into the standard core 
rules which get applied by default.
   > 
   > @asolimando I have checked in `matchFilter` and `matchJoin` whether the 
expanded newCondition is equal to the original condition. If they are the same, 
it will be returned directly. For example the rule is matched for the first 
time and the condition is successfully expanded, a new Filter/Join is 
generated. When the rule is matched for the second time, the conditions that 
have been successfully expanded will not change, so it will be returned 
directly. I don't think it will be applied infinitely. Is there something else 
I haven't considered?
   
   I have in mind the rules like `JoinPushTransitivePredicatesRule` which, in 
combination with the present rules (and possibly others like `FilterMergeRule` 
but not only) can keep "inferring" predicates and apply them indefinitely.
   
   The problematic pattern could be as follows:
   1) the present rule produces new predicates that can be pushed down
   2) `JoinPushTransitivePredicatesRule` infers new predicates/filters that are 
then transposed and merged with existing ones
   3) the present rule will match again as the filter is not the same as before 
(due to the merging), so the planner will apply the rule again, and we can end 
up in a cycle
   
   I have looked up similar issues we had in the past around this subject:
   Calcite:
   - CALCITE-2200
   - CALCITE-3505
   - CALCITE-6432
   - CALCITE-6363
   
   Hive:
   - HIVE-25275
   - HIVE-25758 (covered in a past Calcite meetup, 
[video](https://youtu.be/_phzRNCWJfw?feature=shared&t=1745), 
[slides](https://www.slideshare.net/StamatisZampetakis/debugging-planning-issues-using-calcites-builtin-loggers),
 sorry for the self-citation but it goes in-depth on that example so you can 
easily get an idea)
   
   The Hive ones (which uses Calcite) I have had direct experience with, were 
happening often due to the presence of disjunctive filters and limitations in 
`RexSimplify` which at some point was unable to simplify expressions futher.
   
   Rules like `JoinPushTransitivePredicatesRule` rely on `RexSimplify` to 
simplify expressions to reach a fixpoint, in cases where you can infer and 
merge filters, and the resulting expressions can't be simplified to its 
original (equivalent) version, it keeps getting recognized as a new RelNode 
where the same rule has never been applied, so the planner will explore it with 
all matching rules.
   
   I am convinced your proposed optimization is interesting and useful, but I'd 
rather not have it added to core right away and leave it as an optional rule 
for the reason highlighted above, unless we have some guarantees that the extra 
predicates won't lead to cases like the one mentioned above.
   
   Mihai is also right about the bloat parameter I have suggested, another 
source of problems is the complexity of the expansion in some cases, in the 
best case we are able to determine the size of the expanded expression even 
before starting it, by looking at the "shape" of the input expression, in that 
case we would have a "cheap" guard that would let us bail out when the 
transformation would be too costly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to