adriangb commented on issue #15952:
URL: https://github.com/apache/datafusion/issues/15952#issuecomment-3180087675

   My proposal for this which I've put in some other places:
   1. Make good shared helpers for the things that the data sources do in 
common.
     1. We have a good story for filter pushdown already I think.
     2. For projection pushdown I think we should make some of the stuff in 
ProjectionExec public / easy to share.
     3. Touch up FileScanConfig as a shared builder / config for all sources 
(excluding things like CSV or parquet specific options)
   2. Move logic from `DataSourceExec` / `FileScanConfig` into the individual 
`FileSources`
   3. Rip out `FileScanConfig` from the execution pathway
   4. Do little refactors like https://github.com/apache/datafusion/pull/17076
   
   One question I'm not sure about is how to go about all of this: small 
changes, trying to keep backwards compatibility but necessarily breaking a lot 
of things, or make a new parallel implementation where we try to preserve the 
high level APIs and then eventually deprecate / replace the current ones. It 
seems to be what we did last refactor and based on discussion in #17076 may be 
the easiest path forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to