realno commented on pull request #1560:
URL: 
https://github.com/apache/arrow-datafusion/pull/1560#issuecomment-1013486440


   @yahoNanJing it is a very well written document, great work!
   
   I am wondering if there are any options based on the original poll model you 
have investigated and what your findings are. I think there are many benefits 
for using the poll/pull model:
   
   - The scheduler and executor are better decoupled. The scheduler does not 
need to have any knowledge of the executors, its job is to construct and 
optimize the plan. On the other hand the executors just need to know where to 
get the tasks, this can be future abstracted by using some queuing or messaging 
system. It is a fairly clean design and can scale pretty well. 
   - There are minimal states maintained within the system, that will help 
stability and resilience of the system
   - The complexity of the system is low comparing to the push model
   
   Regarding the original issue, I see a good reason to try reducing CPU usage. 
In terms of query time, is it that critical for DataFusion use cases? IMO we 
would optimize for large distrbuted jobs, perhaps we can live with a few 
millisecond delay here and there.
   
   Again, thanks for the proposal I am curious about what you and other 
contributors think. 
   
   BTW, I am recently thinking about having Ballista production ready and work 
well with modern cloud native architecture, I think you are into the same 
topic. I am happy to have discussion about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to