[jira] [Updated] (FLINK-25832) When the TaskManager is closed, its associated slot is not set to the released state.

Piotr Nowojski (Jira) Fri, 04 Feb 2022 03:57:05 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-25832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Piotr Nowojski updated FLINK-25832:
-----------------------------------
    Component/s: Runtime / Coordination
                     (was: Runtime / Task)

> When the TaskManager is closed, its associated slot is not set to the 
> released state.
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-25832
>                 URL: https://issues.apache.org/jira/browse/FLINK-25832
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.14.2, 1.14.3
>            Reporter: john
>            Priority: Major
>         Attachments: image-2022-01-27-10-55-14-758.png, 
> image-2022-01-27-10-55-59-119.png, image-2022-01-27-10-57-26-223.png
>
>
> I deployed a standalone flink cluster on k8s and enabled 
> scheduler-mode=reactive. When Taskmanager is closed, I actively call the 
> closeTaskManagerConnection method of ResourceManager. However, when 
> AdaptiveScheduler actively starts to restart the job, it calls the cancel 
> method of Execution at this time, but this method does not judge whether the 
> status of its associated slot is Alive. The Taskmanager to which this slot 
> belongs has been closed, and RpcTimeout is triggered at this time.
> But when I change the cancel method of Execution, after judging whether the 
> status of the slot is Alive before cancel, repeating the above operation is 
> still invalid, that is, RpcTimeout will still be triggered. My problem is: 
> Active in the ResourceManager's closeTaskManagerConnection method, does not 
> affect the state of its associated allocated slot. I think this is a bug. We 
> should optimize the behavior of cancel to speed up the execution of cancel.
> !image-2022-01-27-10-55-14-758.png!!image-2022-01-27-10-55-59-119.png!
> !image-2022-01-27-10-57-26-223.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-25832) When the TaskManager is closed, its associated slot is not set to the released state.

Reply via email to