[ 
https://issues.apache.org/jira/browse/FLINK-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858589#comment-16858589
 ] 

Till Rohrmann commented on FLINK-12736:
---------------------------------------

As a corollary, it could also happen that new partitions are stored on the TM 
if it can have allocated slots when the callback is being processed. I guess in 
order to properly solve this problem we would need something like a message 
counter between the RM and the TM. Only if the message counter is the same as 
before sending the partition check message, we can be sure that nothing has 
changed on the TM.

> ResourceManager may release TM with allocated slots
> ---------------------------------------------------
>
>                 Key: FLINK-12736
>                 URL: https://issues.apache.org/jira/browse/FLINK-12736
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.9.0
>            Reporter: Chesnay Schepler
>            Priority: Critical
>             Fix For: 1.9.0
>
>
> The {{ResourceManager}} looks out for TaskManagers that have not had any 
> slots allocated on them for a while, as these could be released to safe 
> resources. If such a TM is found the RM checks via an RPC call whether the TM 
> still holds any partitions. If no partition is held then the TM is released.
> However, in the RPC callback no check is made whether the TM is actually 
> _still_ idle. In the meantime a slot could've been allocated on the TM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to