Dandandan commented on PR #3228: URL: https://github.com/apache/arrow-datafusion/pull/3228#issuecomment-1229147873
> > Issue https://github.com/apache/arrow-datafusion/issues/3073 seems to be about a filter expression with a column, that somehow doesn't filter the row out. > > @Dandandan since the example in #3073 uses an in-memory table, the filter `where f.a=2` got replaced with `where 1=2` in a step of the optimizer (which, at least for me, makes perfect sense) when we are rewriting columns here: > https://github.com/apache/arrow-datafusion/blob/873b071dff1a6099d30abdd24437e083a60e2686/datafusion/optimizer/src/filter_push_down.rs#L396 > > > I'm not sure if this fixes the bug in issue https://github.com/apache/arrow-datafusion/issues/3073? I'm not totally convinced we should not propagate filters without column (e.g. constants), the result should remain the same whether it is propagated or not. > > There is already a similar work done by an earlier step of the filter pushdown optimizer which ensures that `WHERE 1=2` (and other variants of constant/columns-less filters) are not propagated (since we can't rewrite them if there are no earlier projections to propagate, unlike regular filters). The main thing this PR does is allow this check to be also carried out in `Projection` part where filters might change due to how we propagate constants. > https://github.com/apache/arrow-datafusion/blob/873b071dff1a6099d30abdd24437e083a60e2686/datafusion/optimizer/src/filter_push_down.rs#L358-L368 > Isn't the issue then that the propagated filters without column are not added to the plan at all, even when we are at the bottom of a plan? E.g. a propagated `where false` still should be present regardless of whether it has columns in it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
