[jira] [Resolved] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase

Ufuk Celebi (JIRA) Thu, 04 Jun 2015 02:18:52 -0700

     [ 
https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ufuk Celebi resolved FLINK-2134.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 0.9

Fixed via 0dea359.

> Deadlock in SuccessAfterNetworkBuffersFailureITCase
> ---------------------------------------------------
>
>                 Key: FLINK-2134
>                 URL: https://issues.apache.org/jira/browse/FLINK-2134
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: master
>            Reporter: Ufuk Celebi
>             Fix For: 0.9
>
>
> I ran into the issue in a Travis run for a PR: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt
> I can reproduce this locally by running 
> SuccessAfterNetworkBuffersFailureITCase multiple times:
> {code}
> cluster = new ForkableFlinkMiniCluster(config, false);
> for (int i = 0; i < 100; i++) {
>    // run test programs CC, KMeans, CC
> }
> {code}
> The iteration tasks wait for superstep notifications like this:
> {code}
> "Join (Join at 
> runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) 
> (8/6)" daemon prio=5 tid=0x00007f95f374f800 nid=0x138a7 in Object.wait() 
> [0x0000000123f2a000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000007f89e3440> (a java.lang.Object)
>       at 
> org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
>       - locked <0x00000007f89e3440> (a java.lang.Object)
>       at 
> org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
>       at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. 
> The system needs to be under some load for this to occur after multiple runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase

Reply via email to