xintongsong commented on issue #8740: [FLINK-12763][runtime] Fail job immediately if tasks’ resource needs can not be satisfied. URL: https://github.com/apache/flink/pull/8740#issuecomment-508495382 I think you are right. I can see the problem of the current solution. Just one question about the solution your proposed. What do we do for the batch job? - Do we allow requests to pend and wait for other tasks to finish and release resource? If so, that means we need different behaviors for streaming/batch jobs, and that requires the resource manager to be aware of the two different job types. - Or do we always fail the requests and tell the JM why the request is failed, because the requested resource is too large (probably shouldn't retry) or because there are temporally no available slots (may retry later). This might make some job starving because before it retry the failed requests, resources may become available and then taken by other requests that come latter.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
