Hi Thomas, It is not safe to push the filter in the null generating side of a join since it changes the semantics of the query. Consider the example below.
SELECT b.title FROM Book b LEFT JOIN Author a ON b.author = a.id WHERE a.name <> 'Victor' Author -------------- id | name | 1 | Victor | Book ----------------------------------- id | title |author 1 | Les Miserables | 1 If you apply the a.name <> 'Victor' condition before the join (over the author table) then you will get the following result. 'Les Miserables' If you apply the condition after the join you will get the empty result. Best, Stamatis On Thu, Sep 12, 2019 at 11:55 PM Thomas D'Silva <[email protected]> wrote: > We are using calcite to query data in Kudu. We have implemented our own > sort rule that merges rows returned from kudu when a queries orders by > columns in the same order as the primary key of the kudu table. > For a query that does a left join, in order to get the planner to use my > sort rule I had to disable the FILTER_ON_JOIN rule and use > the DUMB_FILTER_ON_JOIN (which does not try to simplify outer joins). I > also had to modify the following code in FilterJoinRule.perform() > > // Try to push down above filters. These are typically where clause > // filters. They can be pushed down if they are not on the NULL > // generating side. > boolean filterPushed = false; > if (RelOptUtil.classifyFilters( > join, > aboveFilters, > joinType, > needsPushInto(join), > !joinType.generatesNullsOnLeft(), > !joinType.generatesNullsOnRight(), > joinFilters, > leftFilters, > rightFilters)) { > filterPushed = true; > } > > The above code sets the pushLeft and pushRight boolean flags only if the > join type does not generate nulls on the left or right side. I was > wondering if any one knows why this is required? > If a filter only references column present in the left or right table > shouldn't we always be able to push it down? >
