[ 
https://issues.apache.org/jira/browse/FLINK-31297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695795#comment-17695795
 ] 

Weihua Hu commented on FLINK-31297:
-----------------------------------

Thanks [~mapohl] for reporting this bug. Thanks [~Weijie Guo] [~xzw0223] for 
your attention.

This bug was introduced by FLINK-18229. We release the pending task manager 
when these is no more resource requirements.

In 
FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager, 
we skip the FineGrainedSlotManager and invoke 
TaskManagerTracker.addPendingTaskManager directly to allocate pending task 
manager. This make the resource requirements different between SlotManager and 
TaskManagerTracker. After requirementCheckDelay(50ms by default), the 
requirement check will release the pending task manager.

> FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager
>  unstable when running it a single time
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-31297
>                 URL: https://issues.apache.org/jira/browse/FLINK-31297
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Assignee: Weihua Hu
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>
> We noticed a weird test-instability in 
> {{FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager}}
>  when switching to sequential test execution (see FLINK-31278). I couldn't 
> reproduce it in 1.16, therefore, marking it as a blocker for now. But it 
> feels to be more of a test code issue.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46671&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9695
> {code}
> Mar 01 15:20:17 [ERROR] 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager
>   Time elapsed: 0.746 s  <<< FAILURE!
> Mar 01 15:20:17 java.lang.AssertionError: 
> Mar 01 15:20:17 
> Mar 01 15:20:17 Expected size: 1 but was: 0 in:
> Mar 01 15:20:17 []
> Mar 01 15:20:17       at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManagerTest.testTaskManagerRegistrationDeductPendingTaskManager(FineGrainedSlotManagerTest.java:209)
> Mar 01 15:20:17 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to