I like the idea of redundant filters. We should probably add a new kind of 
metadata so that one node can see that a filter is redundant. I’m not sure how 
that metadata would look. A well chosen use case will help us design it. Does 
anyone have one?

> On Sep 27, 2023, at 1:40 AM, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> Hey Ian,
> 
> I don't think there is such a rule in Calcite but you may find similar
> ideas in rules present in other projects.
> 
> In Hive for instance, there is the HivePreFilteringRule [1, 2] that
> pushes "redundantly" some filters below other operations in the tree.
> What is inherently problematic with all these rules that introduce
> "duplicate" predicates in the plan is that they require some kind of
> state to prevent infinite matching. Additionally, you need to pay
> attention when you put together rules that are pushing and pulling
> since they can interact badly with each other.
> 
> Best,
> Stamatis
> 
> [1] 
> https://github.com/apache/hive/blob/57f096c9a73eb92806f2a7cc97f87fabf5d546fe/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePreFilteringRule.java
> [2] 
> https://issues.apache.org/jira/browse/HIVE-9069?focusedCommentId=14534098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14534098
> 
>> On Tue, Sep 26, 2023 at 10:10 PM Ian Bertolacci
>> <ian.bertola...@workday.com.invalid> wrote:
>> 
>> Hi,
>> I was wondering if there exist any rules to duplicate filters which exist 
>> above the join, whose effect is dependent on the result of the join and 
>> therefore cannot be *pushed* below a join, but could be *duplicated* below 
>> the join.
>> 
>> For example: `select … from A LEFT join B on … where B.field is null`
>> Here, the best we could do is push the filter condition into the join 
>> condition, but not necessarily below it, because the null-ness of the column 
>> is partially dependent on the result of the join.
>> However, in this case we can duplicate the condition below the join:
>> `select … from A LEFT join (select … from B where B.field is null) as B on … 
>> where B.field is null`
>> This is because the condition because the null-ness of the column is also 
>> partially dependent value of the column.
>> With both of these filters in place we capture instances of B which are null 
>> because the column is null and because there was no match to B
>> This (1) reduced the cardinality of that side of the join, and (2) 
>> maintained the original intent of the query.
>> 
>> In this example, I use `is null` but we would like to do this for some of 
>> our custom comparison operators.
>> For these operators, we cannot do push-down (because it would change the 
>> intent of the original query) but doing filter duplication should be fine 
>> (though we’re still making sure of that).
>> 
>> I figure that this probably doesn’t exist, in which case I’ll probably use 
>> FilterIntoJoinRule as a jumping off point.
>> Any other suggestions?
>> 
>> Thanks!
>> -Ian J. Bertolacci

Reply via email to