asolimando commented on a change in pull request #2966:
URL: https://github.com/apache/hive/pull/2966#discussion_r798411704



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
##########
@@ -144,27 +146,70 @@ public void onMatch(RelOptRuleCall call) {
     }
 
     // We need to filter i) those that have been pushed already as stored in 
the join,
-    // and ii) those that were already in the subtree rooted at child
-    ImmutableList<RexNode> toPush = 
HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude,
-            child, valids);
-    return toPush;
+    // ii) those that were already in the subtree rooted at child,
+    // iii) predicates that are not safe for transitive inference.
+    //
+    // There is no formal definition of safety for predicate inference, only 
an empirical one.

Review comment:
       The change does not reinforce the dependency on the registry. If you 
check the call, we are filtering candidate predicates coming from 
`HiveRelMdPredicates`, which is not linked to the registry at all. The 
predicates in the registry are used to add another filtering step (to remove 
"duplicates"), but the two steps are totally independent, they are just 
performed within the same method.
   
   This said, I agree that keeping a global state is a problem, and should be 
avoided. The current implementation is also faulty under some respects.
   
   The motivation for storing predicates in the registry is given in 
[HIVE-12478](https://issues.apache.org/jira/browse/HIVE-12478?focusedCommentId=15047639&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15047639).
   
   Since predicates can be modified (pushed past project/setOp/joins/aggregate, 
they can be simplified or merged), we can try to push them transitively as it 
happens in this OOM cases.
   
   But, as I was saying, the current implementation is faulty, since even the 
registry is not enough, because most of the other rules they don't know about 
it, they will alter the predicates, but they won't register them into the 
registry. Making all those rules aware of the registry would increase the 
coupling, hence not an option IMO.
   
   The registry main use was to prevent firing the same rule against the same 
rel multiple times, but even this use is not complete, since rels are modified 
along the way, and we don't update the registry accordingly. For instance, a 
join condition is updated, the new join has a fresh rel-id, the registry will 
try to re-apply `HiveJoinPushTransitivePredicatesRule`, and if the predicates 
were modified too, this will also escape the duplicate check based on the 
`registry.getPushedPredicates(join)`, and we can have a loop).
   
   On the other hand, I don't see how we can prevent such loops without storing 
a global state. But for this global state to work, each and every rule must be 
aware of it, which is totally not doable IMO, which is also the line in Calcite.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to