[
https://issues.apache.org/jira/browse/FLINK-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-12863.
---------------------------------
Resolution: Fixed
Fixed via
1.9.0: a95dac57ef0e1949fd4751ca19350da96c3bf52f
1.8.1: 55c8a69cfa4d40ef2863987eb89adb08f0c45dda
1.7.3: 7333b619fdf3443e179b3f6e8d3147ab4946f91c
> Race condition between slot offerings and AllocatedSlotReport
> -------------------------------------------------------------
>
> Key: FLINK-12863
> URL: https://issues.apache.org/jira/browse/FLINK-12863
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.9.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.7.3, 1.8.1, 1.9.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> With FLINK-11059 we introduced the {{AllocatedSlotReport}} which is used by
> the {{TaskExecutor}} to synchronize its internal view on slot allocations
> with the view of the {{JobMaster}}. It seems that there is a race condition
> between offering slots and receiving the report because the
> {{AllocatedSlotReport}} is sent by the {{HeartbeatManagerSenderImpl}} from a
> separate thread.
> Due to that it can happen that we generate an {{AllocatedSlotReport}} just
> before getting new slots offered. Since the report is sent from a different
> thread, it can then happen that the response to the slot offerings is sent
> earlier than the {{AllocatedSlotReport}}. Consequently, we might receive an
> outdated slot report on the {{TaskExecutor}} causing active slots to be
> released.
> In order to solve the problem I propose to add a fencing token to the
> {{AllocatedSlotReport}} which is being updated whenever we offer new slots to
> the {{JobMaster}}. When we receive the {{AllocatedSlotReport}} on the
> {{TaskExecutor}} we compare the current slot report fencing token with the
> received one and only process the report if they are equal. Otherwise we wait
> for the next heartbeat to send us an up to date {{AllocatedSlotReport}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)