adriangb commented on PR #17076:
URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3168950104

   Right big picture here I think it's quite evident that there are large 
issues with the current design: coupling, circular references, etc. Just take a 
look at this:
   
   
https://github.com/apache/datafusion/blob/407a965d3740634d582376aa22c4b2b57da6f005/datafusion/datasource/src/source.rs#L74-L122
   
   @xudong963 made this diagram (thank you again) because we were all having 
trouble wrapping our heads around how these things are related. Just yesterday 
I ran into a gnarly bug in this area that's probably going to be really hard to 
unravel because logic / information is split across multiple places: 
https://github.com/apache/datafusion/issues/17077 And this complexity is 
blocking important work e.g. https://github.com/apache/datafusion/issues/14993. 
All this is to say: the current status quo is not great. The thing I'm 
struggling with is how to improve that. Unfortunately small incremental 
improvements (like what this PR attempts to do) will result in a lot of churn 
for users. Maybe a better approach is to work on a greenfield replacement that 
attempts to minimize the final API churn? I'm not sure, open to ideas.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to