[ 
https://issues.apache.org/jira/browse/MESOS-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847475#comment-15847475
 ] 

Alexander Rukletsov commented on MESOS-7036:
--------------------------------------------

The deadlock is most probably caused by an unfortunate combination of several 
factors:
1) Dependency between {{iterate}} callback (that contains a reference to 
{{limiter}}) and the entity ({{limiter}}) that triggers *and clears* that 
callback.
2) Lifetime of {{limiter}} that is bounded by the {{iterate}} callback copies.

If all but one {{iterate}} copies, which reference {{limiter}} go out of scope, 
the last copy is destructed during {{clearAllCallbacks()}} on the {{limiter}} 
context, which leads to the deadlock.

> Rate limiter deadlocks during IO Switchboard-related tests
> ----------------------------------------------------------
>
>                 Key: MESOS-7036
>                 URL: https://issues.apache.org/jira/browse/MESOS-7036
>             Project: Mesos
>          Issue Type: Bug
>          Components: test, tests
>         Environment: ASF CI
>            Reporter: Greg Mann
>              Labels: flaky, mesosphere
>         Attachments: AgentAPITest.LaunchNestedContainerSessionWithTTY.txt
>
>
> This has been observed a number of times recently on the ASF CI. While I 
> didn't look through every single failed test log, I've noticed the failure 
> occur during the following tests:
> {code}
> ContentType/AgentAPITest.LaunchNestedContainerSessionWithTTY/1
> ContentType/AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> IOSwitchboardTest.ContainerAttachAfterSlaveRestart
> ContentType/AgentAPITest.LaunchNestedContainerSession/1
> ContentType/AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> IOSwitchboardTest.ContainerAttach
> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
> {code}
> In all cases, we see the following:
> {code}
> **** DEADLOCK DETECTED! ****
> You are waiting on process __limiter__(518)@172.17.0.3:35849 that it is 
> currently executing.
> {code}
> Find attached an entire example log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to