xintongsong commented on issue #8740: [FLINK-12763][runtime] Fail job 
immediately if tasks’ resource needs can not be satisfied.
URL: https://github.com/apache/flink/pull/8740#issuecomment-508495382
 
 
   I think you are right. I can see the problem of the current solution.
   
   Just one question about the solution your proposed. What do we do for the 
batch job?
   - Do we allow requests to pend and wait for other tasks to finish and 
release resource? If so, that means we need different behaviors for 
streaming/batch jobs, and that requires the resource manager to be aware of the 
two different job types.
   - Or do we always fail the requests and tell the JM why the request is 
failed, because the requested resource is too large (probably shouldn't retry) 
or because there are temporally no available slots (may retry later). This 
might make some job starving because before it retry the failed requests, 
resources may become available and then taken by other requests that come 
latter.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to