[
https://issues.apache.org/jira/browse/MESOS-8594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459348#comment-16459348
]
Benjamin Mahler commented on MESOS-8594:
----------------------------------------
{noformat}
commit 8a639ca63bd8071245e270aecdda574aec6f8d3e
Author: Benjamin Mahler <[email protected]>
Date: Sat Apr 28 18:28:39 2018 -0700
Reduced likelihood of a stack overflow in libprocess socket send path.
Currently, the socket send path is implemented using an asynchronous
loop with callbacks. Without using `process::loop`, this pattern is
prone to a stack overflow in the case that all asynchronous calls
complete synchronously. This is possible with sockets if the socket
is always ready for writing. Users have reported the crash in both
MESOS-8594 and MESOS-8834, so the stack overflow is encountered in
practice.
This patch updates the send path to leverage `process::loop`, which
is supposed to prevent stack overflows in asynchronous loops. However,
it is still possible for `process::loop` to stack overflow due to
MESOS-8852. In practice, I expect that even without MESOS-8852 fixed,
users won't see any stack overflows in the send path.
Review: https://reviews.apache.org/r/66863
{noformat}
> Mesos master crash (under load)
> -------------------------------
>
> Key: MESOS-8594
> URL: https://issues.apache.org/jira/browse/MESOS-8594
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 1.3.2, 1.4.1, 1.5.0, 1.6.0
> Reporter: A. Dukhovniy
> Assignee: Benjamin Mahler
> Priority: Blocker
> Labels: reliability
> Fix For: 1.6.0
>
> Attachments: lldb-bt.txt, lldb-di-f.txt, lldb-image-section.txt,
> lldb-regiser-read.txt
>
>
> Mesos master crashes under load. Attached are some infos from the `lldb`:
> {code:java}
> Process 41933 resuming
> Process 41933 stopped
> * thread #10, stop reason = EXC_BAD_ACCESS (code=2, address=0x7000089ecff8)
> frame #0: 0x000000010c30ddb6 libmesos-1.6.0.dylib`::_Some() at some.hpp:35
> 32 template <typename T>
> 33 struct _Some
> 34 {
> -> 35 _Some(T _t) : t(std::move(_t)) {}
> 36
> 37 T t;
> 38 };
> Target 0: (mesos-master) stopped.
> (lldb)
> {code}
> To quote [~abudnik]
> {quote}it’s the stack overflow bug in libprocess due to the way
> `internal::send()` and `internal::_send()` are implemented in `process.cpp`
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)