[
https://issues.apache.org/jira/browse/MESOS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Mahler closed MESOS-457.
---------------------------------
Resolution: Cannot Reproduce
> Killing the slave while forked can cause the forked slave to deadlock.
> ----------------------------------------------------------------------
>
> Key: MESOS-457
> URL: https://issues.apache.org/jira/browse/MESOS-457
> Project: Mesos
> Issue Type: Bug
> Reporter: Benjamin Mahler
>
> This is related to MESOS-393.
> Was discovered on a CentOS Linux machine in production.
> A kill was issued to the slave while forked doing executor launching, and
> then the child slave remained running deadlocked in the following location:
> $ ps aux | grep mesos-slave
> bmahler 13626 0.0 0.0 61224 784 pts/1 S+ 21:28 0:00 grep
> mesos-slave
> root 48629 0.0 2.1 1156480 535644 ? S Apr29 0:00
> /usr/local/sbin/mesos-slave --port=5051
> $ gdb -p 48629
> (gdb) where
> #0 0x00007f7612e484c4 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007f7612e43e1a in _L_lock_1034 () from /lib64/libpthread.so.0
> #2 0x00007f7612e43cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x00007f7611d8a990 in std::locale::locale() () from
> /usr/lib64/libstdc++.so.6
> #4 0x00007f7613694a4d in basic_ostringstream (this=0x80) at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/basic_ios.h:446
> #5 process::UPID::operator std::string (this=0x80) at
> ../../../third_party/libprocess/src/pid.cpp:59
> #6 0x00007f761359febe in
> mesos::internal::slave::CgroupsIsolator::launchExecutor (this=0x7f7600005690,
> slaveId=..., frameworkId=..., frameworkInfo=..., executorInfo=..., uuid=...,
> directory=..., resources=...)
> at ../../src/slave/cgroups_isolator.cpp:578
> #7 0x00007f76134a2582 in operator()<mesos::internal::slave::Isolator*>
> (__functor=<value optimized out>, __a1=0x7f7600005690)
> at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:214
> #8 std::tr1::_Function_handler<void
> ()(mesos::internal::slave::Isolator*),std::tr1::_Bind<std::tr1::_Mem_fn<void
> (mesos::internal::slave::Isolator::*)(const mesos::SlaveID&, const
> mesos::FrameworkID&, const mesos::FrameworkInfo&, const mesos::ExecutorInfo&,
> const UUID&, const std::basic_string<char, std::char_traits<char>,
> std::allocator<char> >&, const mesos::internal::Resources&)>
> ()(std::tr1::_Placeholder<1>, mesos::SlaveID, mesos::FrameworkID,
> mesos::FrameworkInfo, mesos::ExecutorInfo, UUID, std::basic_string<char,
> std::char_traits<char>, std::allocator<char> >, mesos::internal::Resources)>
> >::_M_invoke(const std::tr1::_Any_data &, mesos::internal::slave::Isolator *)
> (__functor=<value optimized out>, __a1=0x7f7600005690) at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:502
> #9 0x00007f76134ab4b4 in operator()<process::ProcessBase*> (__functor=<value
> optimized out>, __a1=0x7f76000058b0) at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/bind_iterate.h:45
> #10 std::tr1::_Function_handler<void
> ()(process::ProcessBase*),std::tr1::_Bind<void (*
> ()(std::tr1::_Placeholder<1>, std::tr1::shared_ptr<std::tr1::function<void
> ()(mesos::internal::slave::Isolator*)> >))(process::ProcessBase*,
> std::tr1::shared_ptr<std::tr1::function<void
> ()(mesos::internal::slave::Isolator*)> >)> >::_M_invoke(const
> std::tr1::_Any_data &, process::ProcessBase *) (__functor=<value optimized
> out>,
> __a1=0x7f76000058b0) at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:502
> #11 0x00007f76136a799a in process::ProcessManager::resume (this=0x19cc6c0,
> process=0x7f76000058b0) at
> ../../../third_party/libprocess/src/process.cpp:2432
> #12 0x00007f76136a89af in process::schedule (arg=<value optimized out>) at
> ../../../third_party/libprocess/src/process.cpp:1167
> #13 0x00007f7612e4173d in start_thread () from /lib64/libpthread.so.0
> #14 0x00007f7611825f6d in clone () from /lib64/libc.so.6
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira