[
https://issues.apache.org/jira/browse/MESOS-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414876#comment-16414876
]
Benjamin Mahler commented on MESOS-8729:
----------------------------------------
Looking at the last stack:
{color:#000000}...{color}
{color:#000000}#8 0x00007f09d2ac1aac in synchronize<std::recursive_mutex> () at
../../3rdparty/stout/include/stout/synchronized.hpp:58 #9 0x00007f09d492c37b in
process::ProcessManager::use () at
../../../3rdparty/libprocess/src/process.cpp:2520 #10 0x00007f09d492e955 in
process::ProcessManager::deliver () at
../../../3rdparty/libprocess/src/process.cpp:2775 // Trying to get a reference
but blocked on the lock.{color}
...
#66 0x00007f09d492e988 in process::ProcessManager::deliver () at
[../../../3rdparty/libprocess/src/process.cpp:2776
|https://github.com/apache/mesos/blob/2e2e38628c1b580a231ddac5270f9848ea4af7af/3rdparty/libprocess/src/process.cpp?utf8=%E2%9C%93#L2776]//
XXX Holds a reference!
...
This thread is doing a deliver (while holding a reference) and synchronously
calls back into deliver and blocks on the lock while holding a reference. The
first thread is therefore stuck spinning under the lock and the reference will
never be released.
{color:#000000}I understand the issue now but haven't thought through a
fix.{color}
> Libprocess: deadlock in process::finalize
> -----------------------------------------
>
> Key: MESOS-8729
> URL: https://issues.apache.org/jira/browse/MESOS-8729
> Project: Mesos
> Issue Type: Bug
> Components: libprocess
> Affects Versions: 1.6.0
> Environment: The issue has been reproduced on Ubuntu 16.04, master
> branch, commit `42848653b2`.
> Reporter: Andrei Budnik
> Priority: Major
> Labels: deadlock, libprocess
> Attachments: deadlock.txt
>
>
> Since we are calling
> [`libprocess::finalize()`|https://github.com/apache/mesos/blob/02ebf9986ab5ce883a71df72e9e3392a3e37e40e/src/slave/containerizer/mesos/io/switchboard_main.cpp#L157]
> before returning from the IOSwitchboard's main function, we expect that all
> http responses are going to be sent back to clients before IOSwitchboard
> terminates. However, after [adding|https://reviews.apache.org/r/66147/]
> `libprocess::finalize()` we have seen that IOSwitchboard might get stuck in
> `libprocess::finalize()`. See attached stacktrace.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)