xintongsong commented on issue #8740: [FLINK-12763][runtime] Fail job 
immediately if tasks’ resource needs can not be satisfied.
URL: https://github.com/apache/flink/pull/8740#issuecomment-508469942
 
 
   Hi @StephanEwen, thank you for the comment, and sorry for being unclear.
   
   The changes in this PR are:
   - Set `ResourceProfile` for slot requests according to the `ResourceSpec`. 
Before this PR, the slot requests are always attached with an `UNKNOWN` 
resource profiles, no matter what the `ResourceSpec` is.
   - Fail a slot request immediately if it requests a slot that too large to be 
satisfied. This is to avoid waiting for the slot request timeout to discover 
the problem.
     - For Yarn/Mesos, the resource profiles of slots in the cluster is 
determined by the configuration on RM side. Therefore, RM knows slots with what 
resource profiles are available at the very beginning.
     - For Standalone, RM does not know which slots exist and what resource 
profiles they have until the TMs are registered. If RM receives a slot request 
that can not be satisfied by any registered slot, it doesn't know whether to 
fail the request or to wait for other TMs to register. The solution in this PR 
is to have an initial period after the RM being started, excepting most TMs 
should register to RM during this period. Then we allow slot requests with any 
resource profile pending during this period, and fail pending and new coming 
requests that can not be satisfied by any registered slot after this period.
   
   I'll rebase the PR to the latest code and reorganize it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to