StephanEwen commented on issue #8740: [FLINK-12763][runtime] Fail job immediately if tasks’ resource needs can not be satisfied. URL: https://github.com/apache/flink/pull/8740#issuecomment-508479800 I think that the problem you describe is a more general problem for the standalone resource manager. I standalone mode, it can take a long time until the "not enough resources" exception comes for streaming jobs, and for batch jobs the "no matching slot". So why don't we solve it in a more general way? I like the idea of a "startup period" in which the standalone RM waits for a longer timeout for TMs (and thus slots) to appear, and after that period slot requests are failed immediately if no free slot is readily available. That idea has floated around for a bit, maybe it is time to go for it. What I don't quite understand is the "mixed solution" in this PR that the startup period is used to discover what resource profiles are available. After that, requests still time out after a long time unless they request a resource profile that is incompatible with the ones seen during the startup period. I think this may lead to strange behavior: - TaskManagers that register late might not get used. You can start larger TMs later, they register, but slot requests still fail. - A profile might be available during the startup period, but the TMs shut down later, and the slot requests cannot be fulfilled any more. But the requests take a long time, because the resource profile was a known profile. All this becomes both easier and more consistent with a simple startup-period for the StandaloneResourceManager. After that, all fail immediately unless a slot is directly available. What do you think? BTW: This would be a change we need to discuss on dev/user mailing lists, because it changes system behavior. Probably most users would agree that it is for the better, but nonetheless, we need to be transparent there.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
