alamb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2702219091
> Maybe this is what you meant but my mind it's possible to do even better: instead of evaluating it during the scan, the file might even contain a pre-evaluated version of it. Yes indeed ! great point > Who is responsible for evaluating these expressions if they can vary on a per-file basis? > If the TableProvider says "yes, I can evaluate that expression" it is then responsible for doing the compute to evaluate it for every single file. Yes, this is how I would expect it to work. The table provider would have to figure out the best way to evalute the projection depending on its actual layout for variants > Maybe that's not an issue but I did want to point out that it blurs the lines of where IO happens and where compute happens. If this is a problem I think it would complicate the API substantially. I agree -- implementing this optimally in a TableProvider will be complex I note that the IO/CPU is already intertwined when implementing something like filter pushdown in parquet, so I am not sure also pusing down expressions makes the problem worse (or better) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org