adriangb commented on issue #15952: URL: https://github.com/apache/datafusion/issues/15952#issuecomment-3180087675
My proposal for this which I've put in some other places: 1. Make good shared helpers for the things that the data sources do in common. 1. We have a good story for filter pushdown already I think. 2. For projection pushdown I think we should make some of the stuff in ProjectionExec public / easy to share. 3. Touch up FileScanConfig as a shared builder / config for all sources (excluding things like CSV or parquet specific options) 2. Move logic from `DataSourceExec` / `FileScanConfig` into the individual `FileSources` 3. Rip out `FileScanConfig` from the execution pathway 4. Do little refactors like https://github.com/apache/datafusion/pull/17076 One question I'm not sure about is how to go about all of this: small changes, trying to keep backwards compatibility but necessarily breaking a lot of things, or make a new parallel implementation where we try to preserve the high level APIs and then eventually deprecate / replace the current ones. It seems to be what we did last refactor and based on discussion in #17076 may be the easiest path forward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org