adriangb commented on issue #19092: URL: https://github.com/apache/datafusion/issues/19092#issuecomment-3679165435
Ok I've done a bit of digging. Firstly I apologize for making a confusing issue. I ran into this while working on a bigger change and wanted to document it to tackle later but evidently even though I included a lot of detail I didn't include the right detail. I'm still not sure what the original e2e reproduction I had for this was, but since I opened this on December 4th it's possible it was fixed by #19130 (const simplifier) which was merged a couple days later, or by #19111 merged after that and changed the structure that @ShashidharM0118 points out in https://github.com/apache/datafusion/pull/19434#issuecomment-3678860994 would have also caused this issue. I did find something interesting: #19136 introduced a new opportunity for optimization via simplification. If we have a constant column say `a = 2` and the predicate `a is not null` we replace that to `2 is not null`. That will get simplified later for the scan but as far as I can tell it does *not* get simplified before being fed into `FilePruner`. `FilePruner` also does not simplify the output of any dynamic filters. So I think if we added a simplifier pass in `FilePruner` we'd get some extra pruning in the case of `constant_col is null`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
