[
https://issues.apache.org/jira/browse/FLINK-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhijiang updated FLINK-9676:
----------------------------
Description:
It may cause deadlock between task canceler thread and task thread.
The detail is as follows:
{{Task canceler thread -> IC1#releaseAllResources -> recycle floating buffers
-> {color:#d04437}lock{color}(LocalBufferPool#availableMemorySegments) ->
IC2#notifyBufferAvailable}} > {color:#d04437}try to lock{color}(IC2#bufferQueue)
{{Task thread -> IC2#recycle -> {color:#d04437}lock{color}(IC2#bufferQueue) ->
bufferQueue#addExclusiveBuffer}} -> {{floatingBuffer#recycleBuffer}} ->
{color:#d04437}try to lock{color}(LocalBufferPool#availableMemorySegments)
One solution is that {{listener#notifyBufferAvailable}} can be called outside
the {{synchronized(availableMemorySegments) in }}{{LocalBufferPool#recycle.}}
The existing RemoteInputChannelTest#testConcurrentOnSenderBacklogAndRecycle can
cover this case but the deadlock probability is very low, so this UT is not
stable.
was:
It may cause deadlock between task canceler thread and task thread.
The detail is as follows:
{{Task canceler thread -> IC1#releaseAllResources -> recycle floating buffers
-> {color:#d04437}lock{color}(LocalBufferPool#availableMemorySegments) ->
IC2#notifyBufferAvailable}} > {color:#d04437}try to lock{color}(IC2#bufferQueue)
{{Task thread -> IC2#recycle -> {color:#d04437}lock{color}(IC2#bufferQueue) ->
bufferQueue#addExclusiveBuffer}} -> {{floatingBuffer#recycleBuffer}} ->
{color:#d04437}try to lock{color}(LocalBufferPool#availableMemorySegments)
One solution is that {{listener#notifyBufferAvailable}} can be called outside
the {{synchronized(availableMemorySegments) in }}{{LocalBufferPool#recycle.}}
> Deadlock during canceling task and recycling exclusive buffer
> -------------------------------------------------------------
>
> Key: FLINK-9676
> URL: https://issues.apache.org/jira/browse/FLINK-9676
> Project: Flink
> Issue Type: Bug
> Components: Network
> Affects Versions: 1.5.0
> Reporter: zhijiang
> Priority: Major
> Fix For: 1.5.1
>
>
> It may cause deadlock between task canceler thread and task thread.
> The detail is as follows:
> {{Task canceler thread -> IC1#releaseAllResources -> recycle floating buffers
> -> {color:#d04437}lock{color}(LocalBufferPool#availableMemorySegments) ->
> IC2#notifyBufferAvailable}} > {color:#d04437}try to
> lock{color}(IC2#bufferQueue)
> {{Task thread -> IC2#recycle -> {color:#d04437}lock{color}(IC2#bufferQueue)
> -> bufferQueue#addExclusiveBuffer}} -> {{floatingBuffer#recycleBuffer}} ->
> {color:#d04437}try to lock{color}(LocalBufferPool#availableMemorySegments)
> One solution is that {{listener#notifyBufferAvailable}} can be called outside
> the {{synchronized(availableMemorySegments) in }}{{LocalBufferPool#recycle.}}
> The existing RemoteInputChannelTest#testConcurrentOnSenderBacklogAndRecycle
> can cover this case but the deadlock probability is very low, so this UT is
> not stable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)