andygrove commented on issue #23194: URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4855994992
More thoughts on this, and I apologize if I am repeating points already made, but I think it makes total sense to have specialized physical plans for in-process vs distributed. The challenge currently is that each distributed project is starting with the DF in-process plan and converting it and adding hacks to work around the differences in assumptions for in-process vs distributed. I would very much like DF to be a good foundation for distributed query engines (that was always the goal). To facilitate this, I think it makes sense to have a new optional crate in DF core specifically for this purpose. Initially, it can be based on the existing approach of taking the DF in-process plan and converting it to a distributed plan, but over time this could evolve into having specialized versions of operators for distributed use. I am going to POC this today and create a draft PR to demonstrate this and see what people think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
