KarmaGYZ commented on code in PR #19840:
URL: https://github.com/apache/flink/pull/19840#discussion_r899686715
##########
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java:
##########
@@ -150,6 +150,18 @@ public class ResourceManagerOptions {
.defaultValue(Duration.ofMillis(50))
.withDescription("The delay of the resource requirements
check.");
+ /**
+ * The long delay of requirements check. This is only used for waiting for
the previous
+ * TaskManagers registration and will only take effect in the first
requirements check.
+ */
+ public static final ConfigOption<Duration> REQUIREMENTS_CHECK_LONG_DELAY =
Review Comment:
```suggestion
@Documentation.Section(Documentation.Sections.EXPERT_SCHEDULING)
public static final ConfigOption<Duration> REQUIREMENTS_CHECK_LONG_DELAY
=
```
##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java:
##########
@@ -491,10 +504,15 @@ public void freeSlot(SlotID slotId, AllocationID
allocationId) {
* are performed with a slight delay.
*/
private void checkResourceRequirementsWithDelay() {
- if (requirementsCheckDelay.toMillis() <= 0) {
- checkResourceRequirements();
- } else {
- if (requirementsCheckFuture == null ||
requirementsCheckFuture.isDone()) {
+ long delay =
+ isRequirementsCheckLongDelay
+ ? requirementsCheckLongDelay.toMillis()
+ : requirementsCheckDelay.toMillis();
+ boolean pendingRequirementsCheck =
Review Comment:
```suggestion
boolean hasPendingRequirementsCheck =
```
##########
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java:
##########
@@ -150,6 +150,18 @@ public class ResourceManagerOptions {
.defaultValue(Duration.ofMillis(50))
.withDescription("The delay of the resource requirements
check.");
+ /**
+ * The long delay of requirements check. This is only used for waiting for
the previous
+ * TaskManagers registration and will only take effect in the first
requirements check.
+ */
+ public static final ConfigOption<Duration> REQUIREMENTS_CHECK_LONG_DELAY =
+ ConfigOptions.key("slotmanager.requirement-check.long-delay")
+ .durationType()
+ .defaultValue(Duration.ofSeconds(3))
Review Comment:
I think the default value might be too long. 1s or 500ms might be good
enough?
##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManager.java:
##########
@@ -434,10 +446,15 @@ public void freeSlot(SlotID slotId, AllocationID
allocationId) {
* are performed with a slight delay.
*/
private void checkResourceRequirementsWithDelay() {
Review Comment:
Same as the `FineGrainedSlotManager`.
##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java:
##########
@@ -504,8 +522,11 @@ private void checkResourceRequirementsWithDelay() {
Preconditions.checkNotNull(requirementsCheckFuture)
.complete(null);
}),
- requirementsCheckDelay.toMillis(),
+ delay,
TimeUnit.MILLISECONDS);
+ isRequirementsCheckLongDelay = false;
+ } else {
+ checkResourceRequirements();
Review Comment:
It seems a bit complicated here. We just need to ensure the delay is no less
than zero. Then, the `ScheduledExecutor#schedule` will help us to handle it.
##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/AbstractFineGrainedSlotManagerITCase.java:
##########
@@ -752,4 +753,76 @@ public void
testAllocationUpdatesIgnoredIfSlotMarkedAsAllocatedAfterSlotReport()
System.setSecurityManager(null);
}
+
+ @Test
+ public void testRequirementLongDelayOnlyTakeEffectOnce() throws Exception {
+ final CompletableFuture<Void> allocationIdFuture1 = new
CompletableFuture<>();
+ final CompletableFuture<Void> allocationIdFuture2 = new
CompletableFuture<>();
+ final TaskExecutorGateway taskExecutorGateway1 =
+ new TestingTaskExecutorGatewayBuilder()
+ .setRequestSlotFunction(
+ tuple6 -> {
+ allocationIdFuture1.complete(null);
+ return
CompletableFuture.completedFuture(Acknowledge.get());
+ })
+ .createTestingTaskExecutorGateway();
+ final TaskExecutorGateway taskExecutorGateway2 =
+ new TestingTaskExecutorGatewayBuilder()
+ .setRequestSlotFunction(
+ tuple6 -> {
+ allocationIdFuture2.complete(null);
+ return
CompletableFuture.completedFuture(Acknowledge.get());
+ })
+ .createTestingTaskExecutorGateway();
+ final TaskExecutorConnection taskExecutorConnection1 =
+ new TaskExecutorConnection(ResourceID.generate(),
taskExecutorGateway1);
+ final TaskExecutorConnection taskExecutorConnection2 =
+ new TaskExecutorConnection(ResourceID.generate(),
taskExecutorGateway2);
+ new Context() {
+ {
+ runTest(
+ () -> {
+ final CompletableFuture<Boolean>
registerTaskManagerFuture1 =
+ new CompletableFuture<>();
+ final CompletableFuture<Boolean>
registerTaskManagerFuture2 =
+ new CompletableFuture<>();
+ runInMainThread(
+ () -> {
+
getSlotManager().enlargeRequirementsCheckDelayOnce();
+ getSlotManager()
+ .processResourceRequirements(
+
createResourceRequirements(
+ new JobID(),
+ 2,
+
DEFAULT_SLOT_RESOURCE_PROFILE));
+ registerTaskManagerFuture1.complete(
+ getSlotManager()
+ .registerTaskManager(
+
taskExecutorConnection1,
+ new
SlotReport(),
+
DEFAULT_SLOT_RESOURCE_PROFILE,
+
DEFAULT_SLOT_RESOURCE_PROFILE));
+ });
+ assertThat(
+
assertFutureCompleteAndReturn(registerTaskManagerFuture1),
+ is(true));
+ assertFutureNotComplete(allocationIdFuture1);
+ allocationIdFuture1.get(200,
TimeUnit.MILLISECONDS);
Review Comment:
It's a bit fragile. Maybe we can enlarge the `requirementCheckLongDelay` to
400ms.
##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/AbstractFineGrainedSlotManagerITCase.java:
##########
@@ -752,4 +753,76 @@ public void
testAllocationUpdatesIgnoredIfSlotMarkedAsAllocatedAfterSlotReport()
System.setSecurityManager(null);
}
+
+ @Test
+ public void testRequirementLongDelayOnlyTakeEffectOnce() throws Exception {
+ final CompletableFuture<Void> allocationIdFuture1 = new
CompletableFuture<>();
+ final CompletableFuture<Void> allocationIdFuture2 = new
CompletableFuture<>();
+ final TaskExecutorGateway taskExecutorGateway1 =
+ new TestingTaskExecutorGatewayBuilder()
+ .setRequestSlotFunction(
+ tuple6 -> {
+ allocationIdFuture1.complete(null);
+ return
CompletableFuture.completedFuture(Acknowledge.get());
+ })
+ .createTestingTaskExecutorGateway();
+ final TaskExecutorGateway taskExecutorGateway2 =
+ new TestingTaskExecutorGatewayBuilder()
+ .setRequestSlotFunction(
+ tuple6 -> {
+ allocationIdFuture2.complete(null);
+ return
CompletableFuture.completedFuture(Acknowledge.get());
+ })
+ .createTestingTaskExecutorGateway();
+ final TaskExecutorConnection taskExecutorConnection1 =
+ new TaskExecutorConnection(ResourceID.generate(),
taskExecutorGateway1);
+ final TaskExecutorConnection taskExecutorConnection2 =
+ new TaskExecutorConnection(ResourceID.generate(),
taskExecutorGateway2);
+ new Context() {
+ {
+ runTest(
+ () -> {
+ final CompletableFuture<Boolean>
registerTaskManagerFuture1 =
+ new CompletableFuture<>();
+ final CompletableFuture<Boolean>
registerTaskManagerFuture2 =
+ new CompletableFuture<>();
+ runInMainThread(
+ () -> {
+
getSlotManager().enlargeRequirementsCheckDelayOnce();
+ getSlotManager()
+ .processResourceRequirements(
+
createResourceRequirements(
+ new JobID(),
+ 2,
+
DEFAULT_SLOT_RESOURCE_PROFILE));
+ registerTaskManagerFuture1.complete(
+ getSlotManager()
+ .registerTaskManager(
+
taskExecutorConnection1,
+ new
SlotReport(),
+
DEFAULT_SLOT_RESOURCE_PROFILE,
+
DEFAULT_SLOT_RESOURCE_PROFILE));
+ });
+ assertThat(
+
assertFutureCompleteAndReturn(registerTaskManagerFuture1),
+ is(true));
+ assertFutureNotComplete(allocationIdFuture1);
+ allocationIdFuture1.get(200,
TimeUnit.MILLISECONDS);
Review Comment:
And here, we just use `assertFutureCompleteAndReturn`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]