Benjamin Mahler created MESOS-457:
-------------------------------------

             Summary: Killing the slave while forked can cause the forked slave 
to deadlock.
                 Key: MESOS-457
                 URL: https://issues.apache.org/jira/browse/MESOS-457
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler


This is related to MESOS-393.

Was discovered on a CentOS Linux machine in production.

A kill was issued to the slave while forked doing executor launching, and then 
the child slave remained running deadlocked in the following location:

$ ps aux | grep mesos-slave
bmahler  13626  0.0  0.0  61224   784 pts/1    S+   21:28   0:00 grep 
mesos-slave
root     48629  0.0  2.1 1156480 535644 ?      S    Apr29   0:00 
/usr/local/sbin/mesos-slave --port=5051

$ gdb -p 48629
(gdb) where
#0  0x00007f7612e484c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f7612e43e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x00007f7612e43cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f7611d8a990 in std::locale::locale() () from 
/usr/lib64/libstdc++.so.6
#4  0x00007f7613694a4d in basic_ostringstream (this=0x80) at 
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/basic_ios.h:446
#5  process::UPID::operator std::string (this=0x80) at 
../../../third_party/libprocess/src/pid.cpp:59
#6  0x00007f761359febe in 
mesos::internal::slave::CgroupsIsolator::launchExecutor (this=0x7f7600005690, 
slaveId=..., frameworkId=..., frameworkInfo=..., executorInfo=..., uuid=..., 
directory=..., resources=...)
    at ../../src/slave/cgroups_isolator.cpp:578
#7  0x00007f76134a2582 in operator()<mesos::internal::slave::Isolator*> 
(__functor=<value optimized out>, __a1=0x7f7600005690)
    at 
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:214
#8  std::tr1::_Function_handler<void 
()(mesos::internal::slave::Isolator*),std::tr1::_Bind<std::tr1::_Mem_fn<void 
(mesos::internal::slave::Isolator::*)(const mesos::SlaveID&, const 
mesos::FrameworkID&, const mesos::FrameworkInfo&, const mesos::ExecutorInfo&, 
const UUID&, const std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >&, const mesos::internal::Resources&)> 
()(std::tr1::_Placeholder<1>, mesos::SlaveID, mesos::FrameworkID, 
mesos::FrameworkInfo, mesos::ExecutorInfo, UUID, std::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, mesos::internal::Resources)> 
>::_M_invoke(const std::tr1::_Any_data &, mesos::internal::slave::Isolator *) 
(__functor=<value optimized out>, __a1=0x7f7600005690) at 
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:502
#9  0x00007f76134ab4b4 in operator()<process::ProcessBase*> (__functor=<value 
optimized out>, __a1=0x7f76000058b0) at 
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/bind_iterate.h:45
#10 std::tr1::_Function_handler<void 
()(process::ProcessBase*),std::tr1::_Bind<void (* ()(std::tr1::_Placeholder<1>, 
std::tr1::shared_ptr<std::tr1::function<void 
()(mesos::internal::slave::Isolator*)> >))(process::ProcessBase*, 
std::tr1::shared_ptr<std::tr1::function<void 
()(mesos::internal::slave::Isolator*)> >)> >::_M_invoke(const 
std::tr1::_Any_data &, process::ProcessBase *) (__functor=<value optimized out>,
    __a1=0x7f76000058b0) at 
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:502
#11 0x00007f76136a799a in process::ProcessManager::resume (this=0x19cc6c0, 
process=0x7f76000058b0) at ../../../third_party/libprocess/src/process.cpp:2432
#12 0x00007f76136a89af in process::schedule (arg=<value optimized out>) at 
../../../third_party/libprocess/src/process.cpp:1167
#13 0x00007f7612e4173d in start_thread () from /lib64/libpthread.so.0
#14 0x00007f7611825f6d in clone () from /lib64/libc.so.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to