[ 
https://issues.apache.org/jira/browse/MESOS-8594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459348#comment-16459348
 ] 

Benjamin Mahler commented on MESOS-8594:
----------------------------------------

{noformat}
commit 8a639ca63bd8071245e270aecdda574aec6f8d3e
Author: Benjamin Mahler <[email protected]>
Date:   Sat Apr 28 18:28:39 2018 -0700

    Reduced likelihood of a stack overflow in libprocess socket send path.

    Currently, the socket send path is implemented using an asynchronous
    loop with callbacks. Without using `process::loop`, this pattern is
    prone to a stack overflow in the case that all asynchronous calls
    complete synchronously. This is possible with sockets if the socket
    is always ready for writing. Users have reported the crash in both
    MESOS-8594 and MESOS-8834, so the stack overflow is encountered in
    practice.

    This patch updates the send path to leverage `process::loop`, which
    is supposed to prevent stack overflows in asynchronous loops. However,
    it is still possible for `process::loop` to stack overflow due to
    MESOS-8852. In practice, I expect that even without MESOS-8852 fixed,
    users won't see any stack overflows in the send path.

    Review: https://reviews.apache.org/r/66863
{noformat}

> Mesos master crash (under load)
> -------------------------------
>
>                 Key: MESOS-8594
>                 URL: https://issues.apache.org/jira/browse/MESOS-8594
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.3.2, 1.4.1, 1.5.0, 1.6.0
>            Reporter: A. Dukhovniy
>            Assignee: Benjamin Mahler
>            Priority: Blocker
>              Labels: reliability
>             Fix For: 1.6.0
>
>         Attachments: lldb-bt.txt, lldb-di-f.txt, lldb-image-section.txt, 
> lldb-regiser-read.txt
>
>
> Mesos master crashes under load. Attached are some infos from the `lldb`:
> {code:java}
> Process 41933 resuming
> Process 41933 stopped
> * thread #10, stop reason = EXC_BAD_ACCESS (code=2, address=0x7000089ecff8)
> frame #0: 0x000000010c30ddb6 libmesos-1.6.0.dylib`::_Some() at some.hpp:35
> 32 template <typename T>
> 33 struct _Some
> 34 {
> -> 35 _Some(T _t) : t(std::move(_t)) {}
> 36
> 37 T t;
> 38 };
> Target 0: (mesos-master) stopped.
> (lldb)
> {code}
> To quote [~abudnik]
> {quote}it’s the stack overflow bug in libprocess due to the way 
> `internal::send()` and `internal::_send()` are implemented in `process.cpp`
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to