james727 commented on issue #1585:
URL: 
https://github.com/apache/arrow-datafusion/issues/1585#issuecomment-1017892324


   @xudong963 I'd like to pick this up once 
https://github.com/apache/arrow-datafusion/pull/1618 is (hopefully) eventually 
merged. Question re: implementation - do you think this should be a separate 
optimizer pass or something else?
   
   Apologies in advance if this is all obvious - the reason I ask is that it 
seems that this rule benefits from filter pushdown occurring both before and 
after the join rewrite happens. For example, consider the following contrived 
query:
   ```sql
   SELECT * FROM (
     SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.uid
   )
   WHERE t2.uid IS NOT NULL
   ```
   
   If we do not push `t2.uid IS NOT NULL` into the subquery, we will not know 
that the join can be rewritten when we optimize the join node. Thus, we should 
push filters down as far as possible before rewriting the join, giving the 
following:
   ```sql
   -- Push filter into subquery
   SELECT * FROM (
     SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.uid
     WHERE t2.uid IS NOT NULL
   )
   
   -- Rewrite join
   SELECT * FROM (
     SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.uid
     WHERE t2.uid IS NOT NULL
   )
   ```
   
   However, at this point we would benefit from another pass with the filter 
pushdown rule, as `t2.uid IS NOT NULL` can now be pushed down to the underlying 
scan on `t2`. 
   
   It looks like optimizer passes are sequenced in a vector defined in 
`context.rs` - so my question could be reframed as: how would you structure 
this change to facilitate first pushing filters down as far as possible, then 
rewriting joins where possible, then pushing down any filters that are now 
enabled due to outer joins becoming inner joins?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to