tillrohrmann edited a comment on issue #9288: [FLINK-13421] [runtime] Do not allocate slots in a MultiTaskSlot when it’s releasing children URL: https://github.com/apache/flink/pull/9288#issuecomment-517262657 Thanks a lot @zhuzhurk. I think with your test case I was able to understand the problem. Here is the description for future reference: The underlying problem is that a releasing `MultiTaskSlot` might be reused by other slot requests as @zhuzhurk stated. This can happen in the following case: Assume we have the following topology with slot sharing enabled, eager deployment and two allocated slots `s1, s2`. ``` src1 \ src2 - sink1 src3 / ``` Flink would assign `s1 -> src1` and `s2 -> src2`. Since `sink1` has three predecessors it will wait for `src3` to get a slot assigned before it can get scheduled. Now we fail `s1`. This will trigger a global failover which cancels `src2`. The cancellation of `src2` will return `s2` to the `SlotPool` where it is used to fulfill the slot request from `src3`. With the assignment of `s2 -> src3`, `sink1` learns about the locations of all of its inputs and will be scheduled. Since slot sharing is enabled, `s1`, which is currently being released, will be assigned to `sink1`. Due to this assignment we modify the `MultiTaskSlot#children` map which causes the `ConcurrentModificationException`.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
