[ 
https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151119#comment-16151119
 ] 

Yan Xu commented on MESOS-7921:
-------------------------------

So libprocess GC would delete the managed process upon their exit: 
https://github.com/apache/mesos/blob/1ae308c2f1344d9e62e094ab11cc195c96eb5c04/3rdparty/libprocess/include/process/gc.hpp#L45

{code:title=}
  virtual void exited(const UPID& pid)
  {
    if (processes.count(pid) > 0) {
      const ProcessBase* process = processes[pid];
      processes.erase(pid);
      delete process;
    }
  }
{code}

What happens when another process who's waiting on it donates the thread to 
this process which is terminated after it is extracted from the run queue? 
Could it be destructed before resuming it?
 
https://github.com/apache/mesos/blob/1ae308c2f1344d9e62e094ab11cc195c96eb5c04/3rdparty/libprocess/src/process.cpp#L3581-L3587

{code:title=}
  if (process != nullptr) {
    VLOG(2) << "Donating thread to " << process->pid << " while waiting";
    ProcessBase* donator = __process__;
    resume(process);
    running.fetch_sub(1);
    __process__ = donator;
  }
{code}

> process::EventQueue sometimes crashes
> -------------------------------------
>
>                 Key: MESOS-7921
>                 URL: https://issues.apache.org/jira/browse/MESOS-7921
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 1.4.0
>         Environment: autotools,gcc,--verbose,GLOG_v=1 
> MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)
> Note that --enable-lock-free-event-queue is not enabled.
> Details: 
> https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/
>            Reporter: Yan Xu
>            Priority: Blocker
>         Attachments: 
> FetcherCacheTest.CachedCustomOutputFileWithSubdirectory.log.txt, 
> MesosContainerizerSlaveRecoveryTest.ResourceStatisticsFullLog.txt
>
>
> The following segfault is found on 
> [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/]
>  in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky 
> and shows up in other tests and environments (with or without 
> --enable-lock-free-event-queue) as well.
> {noformat: title=Configuration}
> ./bootstrap '&&' ./configure --verbose '&&' make -j6 distcheck
> {noformat}
> {noformat:title=}
> *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are 
> using GNU date ***
> PC: @     0x2b9e2581caa0 process::EventQueue::Consumer::empty()
> *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack 
> trace: ***
>     @     0x2b9e29d26330 (unknown)
>     @     0x2b9e2581caa0 process::EventQueue::Consumer::empty()
>     @     0x2b9e25800a40 process::ProcessManager::resume()
>     @     0x2b9e2580f891 
> process::ProcessManager::init_threads()::$_9::operator()()
>     @     0x2b9e2580f7d5 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>     @     0x2b9e2580f7a5 std::_Bind_simple<>::operator()()
>     @     0x2b9e2580f77c std::thread::_Impl<>::_M_run()
>     @     0x2b9e29fe5a60 (unknown)
>     @     0x2b9e29d1e184 start_thread
>     @     0x2b9e2a851ffd (unknown)
> make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped)
> {noformat}
> A [email protected] query shows many such instances: 
> https://lists.apache.org/[email protected]:lte=1M:process%3A%3AEventQueue%3A%3AConsumer%3A%3Aempty



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to