adriangb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-3028379524
Looking into this I think it will be easier than anticipated. In particular, we already have `ExecutionPlan::try_swapping_with_projection` and that is implemented all the way down to `FileScanConfig`: https://github.com/apache/datafusion/blob/83625dd5500daaa18e4081c9e3544bbe488aefdc/datafusion/datasource/src/file_scan_config.rs#L584-L619 So I think the only bit that's needed is to add APIs or refactor the existing APIs for `FileSource`'s to accept a `Vec<Arc<dyn PhysicalExpr>>` pulled out of the `ProjectionExec` instead of only accepting columns / a `Vec<usize>`. The inside of e.g. `ParquetOpener` we can re-use the machinery introduced in https://github.com/apache/datafusion/pull/16461 and all of the future optimizations / rewrites we'll be incorporating there to actually evaluate the expressions during the scan just as `ProjectionExec` would but with awareness of each file's physical schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org