alamb commented on issue #14993:
URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2702219091

   > Maybe this is what you meant but my mind it's possible to do even better: 
instead of evaluating it during the scan, the file might even contain a 
pre-evaluated version of it. 
   
   Yes indeed ! great point
   
   > Who is responsible for evaluating these expressions if they can vary on a 
per-file basis?
   > If the TableProvider says "yes, I can evaluate that expression" it is then 
responsible for doing the compute to evaluate it for every single file. 
   
   Yes, this is how I would expect it to work.
   
   The table provider would have to figure out the best way to evalute the 
projection depending on its actual layout
   
   for variants
   
   > Maybe that's not an issue but I did want to point out that it blurs the 
lines of where IO happens and where compute happens. If this is a problem I 
think it would complicate the API substantially.
   
   I agree -- implementing this optimally in a TableProvider will be complex
   
   I note that the IO/CPU is already intertwined when implementing something 
like filter pushdown in parquet, so I am not sure also pusing down expressions 
makes the problem worse (or better)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to