andygrove commented on issue #23194:
URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4855994992

   More thoughts on this, and I apologize if I am repeating points already 
made, but I think it makes total sense to have specialized physical plans for 
in-process vs distributed. The challenge currently is that each distributed 
project is starting with the DF in-process plan and converting it and adding 
hacks to work around the differences in assumptions for in-process vs 
distributed. I would very much like DF to be a good foundation for distributed 
query engines (that was always the goal). 
   
   To facilitate this, I think it makes sense to have a new optional crate in 
DF core specifically for this purpose. Initially, it can be based on the 
existing approach of taking the DF in-process plan and converting it to a 
distributed plan, but over time this could evolve into having specialized 
versions of operators for distributed use.
   
   I am going to POC this today and create a draft PR to demonstrate this and 
see what people think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to