tillrohrmann edited a comment on issue #9288: [FLINK-13421] [runtime] Do not 
allocate slots in a MultiTaskSlot when it’s releasing children
URL: https://github.com/apache/flink/pull/9288#issuecomment-517262657
 
 
   Thanks a lot @zhuzhurk. I think with your test case I was able to understand 
the problem. Here is the description for future reference:
   
   The underlying problem is that a releasing `MultiTaskSlot` might be reused 
by other slot requests as @zhuzhurk stated. This can happen in the following 
case: Assume we have the following topology with slot sharing enabled, eager 
deployment and two allocated slots `s1, s2`.
   
   ```
   src1 \
   src2 - sink1 
   src3 /
   ```
   
   Flink would assign `s1 -> src1` and `s2 -> src2`. Since `sink1` has three 
predecessors it will wait for `src3` to get a slot assigned before it can get 
scheduled.
   
   Now we fail `s1`. This will trigger a global failover which cancels `src2`. 
The cancellation of `src2` will return `s2` to the `SlotPool` where it is used 
to fulfill the slot request from `src3`. With the assignment of `s2 -> src3`, 
`sink1` learns about the locations of all of its inputs and will be scheduled. 
Since slot sharing is enabled, `s1`, which is currently being released, will be 
assigned to `sink1`. Due to this assignment we modify the 
`MultiTaskSlot#children` map which causes the `ConcurrentModificationException`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to