[
https://issues.apache.org/jira/browse/FLINK-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858589#comment-16858589
]
Till Rohrmann commented on FLINK-12736:
---------------------------------------
As a corollary, it could also happen that new partitions are stored on the TM
if it can have allocated slots when the callback is being processed. I guess in
order to properly solve this problem we would need something like a message
counter between the RM and the TM. Only if the message counter is the same as
before sending the partition check message, we can be sure that nothing has
changed on the TM.
> ResourceManager may release TM with allocated slots
> ---------------------------------------------------
>
> Key: FLINK-12736
> URL: https://issues.apache.org/jira/browse/FLINK-12736
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.9.0
> Reporter: Chesnay Schepler
> Priority: Critical
> Fix For: 1.9.0
>
>
> The {{ResourceManager}} looks out for TaskManagers that have not had any
> slots allocated on them for a while, as these could be released to safe
> resources. If such a TM is found the RM checks via an RPC call whether the TM
> still holds any partitions. If no partition is held then the TM is released.
> However, in the RPC callback no check is made whether the TM is actually
> _still_ idle. In the meantime a slot could've been allocated on the TM.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)