[
https://issues.apache.org/jira/browse/FLINK-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ufuk Celebi closed FLINK-5228.
------------------------------
Resolution: Fixed
Fixed in 388acbc (release-1.1), 3229dc0 (master).
> LocalInputChannel re-trigger request and release deadlock
> ---------------------------------------------------------
>
> Key: FLINK-5228
> URL: https://issues.apache.org/jira/browse/FLINK-5228
> Project: Flink
> Issue Type: Bug
> Components: Network
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
> Priority: Critical
> Fix For: 1.2.0, 1.1.4
>
>
> Concurrent release and re-triggering of a partition request can lead to a
> deadlock.
> {code}
> Found one Java-level deadlock:
> =============================
> "Canceler for Map -> Sink: Unnamed (1/4)":
> waiting to lock monitor 0x0000000001e27bd8 (object 0x00000000ffa1f688, a
> java.lang.Object),
> which is held by "Timer-3"
> "Timer-3":
> waiting to lock monitor 0x00007fdbd029ec48 (object 0x00000000ffa1f3a0, a
> java.lang.Object),
> which is held by "Canceler for Map -> Sink: Unnamed (1/4)"
> Java stack information for the threads listed above:
> ===================================================
> "Canceler for Map -> Sink: Unnamed (1/4)":
> at
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.releaseAllResources(LocalInputChannel.java:240)
> - waiting to lock <0x00000000ffa1f688> (a java.lang.Object)
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.releaseAllResources(SingleInputGate.java:348)
> - locked <0x00000000ffa1f3a0> (a java.lang.Object)
> at
> org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1280)
> at java.lang.Thread.run(Thread.java:745)
> "Timer-3":
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.retriggerPartitionRequest(SingleInputGate.java:307)
> - waiting to lock <0x00000000ffa1f3a0> (a java.lang.Object)
> at
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:128)
> - locked <0x00000000ffa1f688> (a java.lang.Object)
> at
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel$1.run(LocalInputChannel.java:148)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)