adriangb commented on PR #19404: URL: https://github.com/apache/datafusion/pull/19404#issuecomment-3728564354
> > @discord9 I can't make a PR to your fork as far as far as I can tell, but can you check the diff in [7ee2bfb](https://github.com/apache/datafusion/commit/7ee2bfb353c30764f848491298f2124421f617c7) and see what you think? Sorry some of it is in [22981aa](https://github.com/apache/datafusion/commit/22981aa3c2b3d9f789e608e5dcccfd8f2b52cba8) as well. > > I have some doubt about leave it unchanged, say if in this case: > > ``` > Sort filter=a@0>800 > <Some strange Extension node that went unhandled>(say something cause a unknown column(a new a@0 in this case) to appear> > Projection some other columns, old a@0 not included(filter should be unknown column at this stage) > <do something with old a@0> > Projection b@0 as a@0 > TableScan filter= DynamicFilter [ b@0 > 800](should be unknown column, but left unchange confuse it with existing columns) > ``` > > I guess my point is that replace not found column with `UnknownColumn` will allow downstream user to better debug their problem when `DynamicFilter` is involved(Since evaluate `UnknownColumn` return errors, and downstream custom TableScan could use this to easily determine whether he can use this dynamic filter, and even downstream user miss to handle Extension node's gather pushdown, it's easier to spot due to `UnknownColumn`), and for datafusion itself, in normal case without external extension logical plan, projection shouldn't cause unknown columns in filter, so it's ok? But if we left unknown columns unchanged, certain query(with extension plan) might confuse those unchanged columns with something else and cause hidden bug which is much hard to discover or debug, so having `UnknownColumn` is more of a defensive guard thing? And it seems better to have a explict way to see if it went wrong does no harm and is beneficial? Altough if you insist I can cherry pick [7ee2bfb](http s://github.com/apache/datafusion/commit/7ee2bfb353c30764f848491298f2124421f617c7) Edit: Maybe a middle ground is just refuse to pushdown filters when encounter unknown columns? Edit: done I just don't think this will happen. If a plan is introducing an `UnknownColumn` indeed we can't support projection pushdown. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
