yashlimbad commented on PR #4840: URL: https://github.com/apache/calcite/pull/4840#issuecomment-4160587931
> > > > decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better > > > > > > > > > Making `FilterJoinRule` handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries. That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review. > > > > > > I agree that subquery remove rule will do the same, but what if someone invokes `FilterJoinRule` but not `SubqueryRemoveRule`? the index would remain broken at that point. `SubqueryRemoveRule` handles it, it doesn't mean we shouldn't fix pushed filter's broken index. And still `SubqueryRemoveRule` will not get correct variable set because we failed to extract variable set from subquery. Give me sometime, I will brainstorm on your outer scope review and comeback. > > Most of Calcite is based on the assumption that subquery removal is always a good first step, and further rules don't have to deal with subqueries. > > The assumption looks reasonable to me, and I don't see a valid reason to do otherwise here. > > More code and more complex support for rules makes them harder to maintain, evolve and understand, without a compelling reason we should strive to keep things simple IMO. @asolimando Thanks, that makes sense from a maintenance/simplicity perspective. I think the concern here is a bit narrower than "every rule should support subqueries". Once FilterJoinRule chooses to rewrite a filter that contains a correlated subquery, it is already operating on that structure. At that point, I think the rule should do one of two things: 1. preserve a valid expression after the rewrite, or 2. explicitly decline that transformation My concern with the current behavior is that it rewrites the predicate but can leave correlated field references inconsistent, which means correctness depends on a later SubQueryRemoveRule pass to repair the intermediate state. In other words, the issue is not only whether SubQueryRemoveRule can eventually handle it, but whether FilterJoinRule should emit an invalid expression for a shape it has already transformed. The PR description itself is fixing exactly that kind of planning/decorrelation correctness issue around correlated variable field indices after filter pushdown. I agree that adding broad subquery support to FilterJoinRule would be too much. So if the preference is to keep subquery handling centralized, I think a conservative alternative would be to skip pushdown when the predicate contains correlated subqueries, rather than allowing a rewrite that can leave broken correlated references. So from my perspective the acceptable outcomes are: * fix the correlated-reference shifting/variable propagation during pushdown, or * prohibit this specific pushdown for correlated subqueries. But relying on a later rule to repair an intermediate invalid state seems brittle. If the preference is to keep subquery handling centralized, I'm happy to revise this toward the conservative bail-out approach and avoid pushing conjuncts that contain correlated references. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
