adriangb commented on PR #17076: URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3168950104
Right big picture here I think it's quite evident that there are large issues with the current design: coupling, circular references, etc. Just take a look at this: https://github.com/apache/datafusion/blob/407a965d3740634d582376aa22c4b2b57da6f005/datafusion/datasource/src/source.rs#L74-L122 @xudong963 made this diagram (thank you again) because we were all having trouble wrapping our heads around how these things are related. Just yesterday I ran into a gnarly bug in this area that's probably going to be really hard to unravel because logic / information is split across multiple places: https://github.com/apache/datafusion/issues/17077 And this complexity is blocking important work e.g. https://github.com/apache/datafusion/issues/14993. All this is to say: the current status quo is not great. The thing I'm struggling with is how to improve that. Unfortunately small incremental improvements (like what this PR attempts to do) will result in a lot of churn for users. Maybe a better approach is to work on a greenfield replacement that attempts to minimize the final API churn? I'm not sure, open to ideas. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org