adriangb commented on issue #15952:
URL: https://github.com/apache/datafusion/issues/15952#issuecomment-3180087675
My proposal for this which I've put in some other places:
1. Make good shared helpers for the things that the data sources do in
common.
1. We have a good story for filter pushdown already I think.
2. For projection pushdown I think we should make some of the stuff in
ProjectionExec public / easy to share.
3. Touch up FileScanConfig as a shared builder / config for all sources
(excluding things like CSV or parquet specific options)
2. Move logic from `DataSourceExec` / `FileScanConfig` into the individual
`FileSources`
3. Rip out `FileScanConfig` from the execution pathway
4. Do little refactors like https://github.com/apache/datafusion/pull/17076
One question I'm not sure about is how to go about all of this: small
changes, trying to keep backwards compatibility but necessarily breaking a lot
of things, or make a new parallel implementation where we try to preserve the
high level APIs and then eventually deprecate / replace the current ones. It
seems to be what we did last refactor and based on discussion in #17076 may be
the easiest path forward.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]