[
https://issues.apache.org/jira/browse/FLINK-31297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695795#comment-17695795
]
Weihua Hu commented on FLINK-31297:
-----------------------------------
Thanks [~mapohl] for reporting this bug. Thanks [~Weijie Guo] [~xzw0223] for
your attention.
This bug was introduced by FLINK-18229. We release the pending task manager
when these is no more resource requirements.
In
FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager,
we skip the FineGrainedSlotManager and invoke
TaskManagerTracker.addPendingTaskManager directly to allocate pending task
manager. This make the resource requirements different between SlotManager and
TaskManagerTracker. After requirementCheckDelay(50ms by default), the
requirement check will release the pending task manager.
> FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager
> unstable when running it a single time
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-31297
> URL: https://issues.apache.org/jira/browse/FLINK-31297
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.17.0
> Reporter: Matthias Pohl
> Assignee: Weihua Hu
> Priority: Critical
> Labels: pull-request-available, test-stability
>
> We noticed a weird test-instability in
> {{FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager}}
> when switching to sequential test execution (see FLINK-31278). I couldn't
> reproduce it in 1.16, therefore, marking it as a blocker for now. But it
> feels to be more of a test code issue.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46671&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9695
> {code}
> Mar 01 15:20:17 [ERROR]
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager
> Time elapsed: 0.746 s <<< FAILURE!
> Mar 01 15:20:17 java.lang.AssertionError:
> Mar 01 15:20:17
> Mar 01 15:20:17 Expected size: 1 but was: 0 in:
> Mar 01 15:20:17 []
> Mar 01 15:20:17 at
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager(FineGrainedSlotManagerTest.java:209)
> Mar 01 15:20:17
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)