adriangb commented on issue #14993:
URL: https://github.com/apache/datafusion/issues/14993#issuecomment-3028379524

   Looking into this I think it will be easier than anticipated. In particular, 
we already have `ExecutionPlan::try_swapping_with_projection` and that is 
implemented all the way down to `FileScanConfig`:
   
   
https://github.com/apache/datafusion/blob/83625dd5500daaa18e4081c9e3544bbe488aefdc/datafusion/datasource/src/file_scan_config.rs#L584-L619
   
   So I think the only bit that's needed is to add APIs or refactor the 
existing APIs for `FileSource`'s to accept a `Vec<Arc<dyn PhysicalExpr>>` 
pulled out of the `ProjectionExec` instead of only accepting columns / a 
`Vec<usize>`.
   The inside of e.g. `ParquetOpener` we can re-use the machinery introduced in 
https://github.com/apache/datafusion/pull/16461 and all of the future 
optimizations / rewrites we'll be incorporating there to actually evaluate the 
expressions during the scan just as `ProjectionExec` would but with awareness 
of each file's physical schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to