StephanEwen commented on issue #8740: [FLINK-12763][runtime] Fail job 
immediately if tasks’ resource needs can not be satisfied.
URL: https://github.com/apache/flink/pull/8740#issuecomment-508656222
 
 
   I think the difference between batch and streaming should not manifest in 
the ResourceManager.
   
   It can manifest in the scheduler, so let's see if we can cover this there. 
What do you think about this approach:
   
     - When the scheduler requests slots from the SlotPool, it uses a timeout. 
     - For streaming, that is finite (you want a "NotEnoughResourcesAvailable" 
exception rather soon.
     - For batch, it is infinite, because the same slots can be used after 
another.
     - Failures from the ResourceManager to allocate a slot (timeout, 
whatsoever) only cancel the Future. But this is not propagated to the request 
from the scheduler.
   
     - Open issue: How to ensure that there is at least one slot of the 
relevant size
   
   Long Term Approach
     - We want to change the SlotPool such that you set something like 
       `min: x slots of profile a and y slots of profile b`
       `preferred: k slots of profile a, i slots of profile b`
     - That is also the way to grow resources before triggering scaling in 
streaming auto scaling
     - In streaming, when "NotEnoughResourcesAvailable" exception comes, then 
we trigger auto-scale-down
   
   Short Term
     - Maybe we assume the minimum is always one
     - Slot pool requests do not fail as long as there is one slot of the 
desired resource profile.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to