xintongsong commented on issue #8704: [FLINK-12812][runtime] Set resource 
profiles for task slots
URL: https://github.com/apache/flink/pull/8704#issuecomment-508334308
 
 
   Thanks for the review, @StephanEwen. 
   
   I would like to explain regarding your concern about the assumption of RM/TM 
having same configuration:
   
   The reason we need to calculate TM's slot resource profiles on RM side is 
that, we need to set resource profile for `PendingTaskManagerSlot` before the 
corresponding TM is started. 
   
   Currently, Flink can assign a pending slot to a slot request before the TM 
is started and registered. In this way, the subsequent slot requests will first 
consume slots on the pending TM (for multi-slot TMs) before requesting and 
launching a new one. When the TM is registered, the SlotManager matches the 
registered new slot to a `PendingTaskManagerSlot` with the same resource 
profile, and assigns the registered slot to the same slot request that the 
pending slot is assigned to (if any).
   
   Before this PR, both the pending slot on RM side and the actual slot on TM 
side have the same resource profile `ANY`, which can be matched with the method 
`equals`. Since this PR sets the slot resource profile on TM side to the actual 
resource of the slot, we need to set the resource profile for the pending slots 
on RM side in the same way. This is way I introduced calculating TM's slot 
resource profiles on RM side, and the approximate matching.
   
   The assigning over pending slots and the RM side slot resource calculating 
only happens on Yarn/Mesos. In these scenarios, TMs do have the same 
configuration as RM does, which is transmitted from RM side. For a standalone 
cluster, there should be no pending slots because RM can not actively start any 
TM.
   
   Except for the `PendingTaskManagerSlot`, RM does use the slot resource 
profile reported from TM for matching slot request against registered slots, 
and converting requested `UNKNOWN` resource profile to a default value (as 
shown in the following PR #8846 for dynamic managed memory). Therefore, it 
should not cause problems on a standalone cluster with TMs having different 
configs.
   
   It's my bad not making these clear in codes and comments. For the rest of 
your comments, I'll address them ASAP. I especially admire your suggestions on 
encapsulation and simplifying tests. It's a good lesson for me. Thank you again.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to