xintongsong commented on a change in pull request #11615: [FLINK-16605] Add max
limitation to the total number of slots
URL: https://github.com/apache/flink/pull/11615#discussion_r406571178
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManagerImpl.java
##########
@@ -375,6 +375,12 @@ public void registerTaskManager(final
TaskExecutorConnection taskExecutorConnect
if
(taskManagerRegistrations.containsKey(taskExecutorConnection.getInstanceID())) {
reportSlotStatus(taskExecutorConnection.getInstanceID(), initialSlotReport);
} else {
+ if (getNumberRegisteredSlots() +
Math.max(getNumberPendingTaskManagerSlots(), numSlotsPerWorker) > maxSlotNum) {
+ LOG.warn("The total number of slots exceeds the
max limitation, release the excess resource.");
+
resourceActions.releaseResource(taskExecutorConnection.getInstanceID(), new
FlinkException("The total number of slots exceeds the max limitation."));
+ return;
+ }
Review comment:
It seems to me that this approach implicitly assumes that `releaseResource`
returns false means running on a standalone cluster, which is not always true.
Without this assumption, continuing registering the slot when the release
action failed seems quite against intuition.
Instead of let the slot manager behaves differently according to the release
action result, I would suggest to simply pass in infinite large values for the
max limits when creating the slot manager. In this way, slot manager should not
need to behave differently on standalone and active resource managers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services