[
https://issues.apache.org/jira/browse/MESOS-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880497#comment-15880497
]
Milan Baran edited comment on MESOS-6909 at 2/28/17 9:00 AM:
-------------------------------------------------------------
I got similiar issue.
I'd suggest extending ABORT log with *argv* nad *envp* for better debuging.
{code}
os::execvpe(path.c_str(), argv, envp);
ABORT("Failed to os::execvpe on path '" + path + "': " + os::strerror(errno));
{code}
My problem is with locating docker.
Fullstack trace:
{code}
I0223 14:04:21.628685 644 docker.cpp:1022] Starting container
'af5156f0-b204-4d5b-9c10-f45dc386c8c2' for task
'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd' (and executor
'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd') of framework
cb0578e4-e2ed-4a7e-9a8c-ad946194f49b-0001
E0223 14:04:21.752529 644 slave.cpp:4423] Container
'af5156f0-b204-4d5b-9c10-f45dc386c8c2' for executor
'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd' of framework
cb0578e4-e2ed-4a7e-9a8c-ad946194f49b-0001 failed to start: Failed to run
'/usr/bin/docker --tls -H unix:///var/run/docker.sock pull busybox:latest':
terminated with signal Aborted; stderr='ABORT:
(../../../../../..//tmp/mesos-build/mesos-repo/3rdparty/libprocess/include/process/posix/subprocess.hpp:214):
Failed to os::execvpe on path '/usr/bin/docker --tls': No such file or
directory
*** Aborted at 1487858661 (unix time) try "date -d @1487858661" if you are
using GNU date ***
PC: @ 0x7f336d1dac37 (unknown)
*** SIGABRT (@0x2c8) received by PID 712 (TID 0x7f3365912700) from PID 712;
stack trace: ***
@ 0x7f336d579330 (unknown)
@ 0x7f336d1dac37 (unknown)
@ 0x7f336d1de028 (unknown)
@ 0x4131ac _Abort()
@ 0x4131ec _Abort()
@ 0x7f336f437a3f process::internal::childMain()
@ 0x7f336f436c3c std::_Function_handler<>::_M_invoke()
@ 0x7f336f436bf3 process::defaultClone()
@ 0x7f336f4388be process::internal::cloneChild()
@ 0x7f336f436120 process::subprocess()
@ 0x7f336e8962ea Docker::__pull()
@ 0x7f336e898ad7 Docker::_pull()
@ 0x7f336e8a679f std::_Function_handler<>::_M_invoke()
@ 0x7f336e8bfebe process::internal::thenf<>()
@ 0x7f336e931b26
_ZN7process8internal3runISt8functionIFvRKNS_6FutureI6OptionIiEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
@ 0x7f336e9341f7 process::Future<>::_set<>()
@ 0x7f336f436a3c process::internal::cleanup()
@ 0x7f336e931b26
_ZN7process8internal3runISt8functionIFvRKNS_6FutureI6OptionIiEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
@ 0x7f336e9341f7 process::Future<>::_set<>()
@ 0x7f336ed7d796
_ZN7process8internal3runISt8functionIFvRK6OptionIiEEEJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
@ 0x7f336ed7d880 process::Future<>::_set<>()
@ 0x7f336f4312f4 process::ReaperProcess::notify()
@ 0x7f336f4314c2 process::ReaperProcess::wait()
@ 0x7f336f402451 process::ProcessManager::resume()
@ 0x7f336f402757
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f336da4ea60 (unknown)
@ 0x7f336d571184 start_thread
@ 0x7f336d29e37d (unknown)
{code}
EDITED:
I did not notice that _execvpe_ has second property an arrays of arguments. So
*--tls* is an argument and that caused the problem. Mby some validation would
be in place.
was (Author: milan.baran):
I got similiar issue.
I'd suggest extending ABORT log with *argv* nad *envp* for better debuging.
{code}
os::execvpe(path.c_str(), argv, envp);
ABORT("Failed to os::execvpe on path '" + path + "': " + os::strerror(errno));
{code}
My problem is with locating docker.
Fullstack trace:
{code}
I0223 14:04:21.628685 644 docker.cpp:1022] Starting container
'af5156f0-b204-4d5b-9c10-f45dc386c8c2' for task
'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd' (and executor
'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd') of framework
cb0578e4-e2ed-4a7e-9a8c-ad946194f49b-0001
E0223 14:04:21.752529 644 slave.cpp:4423] Container
'af5156f0-b204-4d5b-9c10-f45dc386c8c2' for executor
'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd' of framework
cb0578e4-e2ed-4a7e-9a8c-ad946194f49b-0001 failed to start: Failed to run
'/usr/bin/docker --tls -H unix:///var/run/docker.sock pull busybox:latest':
terminated with signal Aborted; stderr='ABORT:
(../../../../../..//tmp/mesos-build/mesos-repo/3rdparty/libprocess/include/process/posix/subprocess.hpp:214):
Failed to os::execvpe on path '/usr/bin/docker --tls': No such file or
directory
*** Aborted at 1487858661 (unix time) try "date -d @1487858661" if you are
using GNU date ***
PC: @ 0x7f336d1dac37 (unknown)
*** SIGABRT (@0x2c8) received by PID 712 (TID 0x7f3365912700) from PID 712;
stack trace: ***
@ 0x7f336d579330 (unknown)
@ 0x7f336d1dac37 (unknown)
@ 0x7f336d1de028 (unknown)
@ 0x4131ac _Abort()
@ 0x4131ec _Abort()
@ 0x7f336f437a3f process::internal::childMain()
@ 0x7f336f436c3c std::_Function_handler<>::_M_invoke()
@ 0x7f336f436bf3 process::defaultClone()
@ 0x7f336f4388be process::internal::cloneChild()
@ 0x7f336f436120 process::subprocess()
@ 0x7f336e8962ea Docker::__pull()
@ 0x7f336e898ad7 Docker::_pull()
@ 0x7f336e8a679f std::_Function_handler<>::_M_invoke()
@ 0x7f336e8bfebe process::internal::thenf<>()
@ 0x7f336e931b26
_ZN7process8internal3runISt8functionIFvRKNS_6FutureI6OptionIiEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
@ 0x7f336e9341f7 process::Future<>::_set<>()
@ 0x7f336f436a3c process::internal::cleanup()
@ 0x7f336e931b26
_ZN7process8internal3runISt8functionIFvRKNS_6FutureI6OptionIiEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
@ 0x7f336e9341f7 process::Future<>::_set<>()
@ 0x7f336ed7d796
_ZN7process8internal3runISt8functionIFvRK6OptionIiEEEJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
@ 0x7f336ed7d880 process::Future<>::_set<>()
@ 0x7f336f4312f4 process::ReaperProcess::notify()
@ 0x7f336f4314c2 process::ReaperProcess::wait()
@ 0x7f336f402451 process::ProcessManager::resume()
@ 0x7f336f402757
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f336da4ea60 (unknown)
@ 0x7f336d571184 start_thread
@ 0x7f336d29e37d (unknown)
{code}
> ABORT execvpe() crash when binaries from launcher_dir cannot be found
> ---------------------------------------------------------------------
>
> Key: MESOS-6909
> URL: https://issues.apache.org/jira/browse/MESOS-6909
> Project: Mesos
> Issue Type: Bug
> Components: agent
> Affects Versions: 1.1.0
> Reporter: Aaron Wood
> Assignee: Kevin Klues
>
> When running the Mesos agent either without --launcher_dir or with a
> --launcher_dir not pointing to the right place tasks are launched you'll get
> a crash:
> {code}
> E0111 10:50:56.665149 20924 slave.cpp:4423] Container
> '6cdd0c9b-cb29-42b0-b6cf-51f410df0f31' for executor
> '99D50FCB-ADB0-6B2A-3FC3-8A47FF178C10' of framework
> d3bc8031-29b6-4c2f-9fe3-a73c1b8b6360-0007 failed to start: Collect failed:
> Failed to setup hostname and network files: ABORT:
> (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:214):
> Failed to os::execvpe on path '/usr/local/libexec/mesos/mesos-containerizer':
> No such file or directory
> Aborted at 1484149856 (unix time) try "date -d @1484149856" if you are using
> GNU date ***
> PC: @ 0x7fc3bd418428 (unknown)
> SIGABRT (@0x51d8) received by PID 20952 (TID 0x7fc3b6007700) from PID 20952;
> stack trace: ***
> @ 0x7fc3bd7bd390 (unknown)
> @ 0x7fc3bd418428 (unknown)
> @ 0x7fc3bd41a02a (unknown)
> @ 0x47fafc _Abort()
> @ 0x47fb2a _Abort()
> @ 0x7fc3c385f092 process::internal::childMain()
> @ 0x7fc3c3864227
> _ZNSt5_BindIFPFiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPPcS9_RKN7process10Subprocess2IO20InputFileDescriptorsERKNSC_21OutputFileDescriptorsESI_bPiRKSt6vectorINSB_9ChildHookESaISL_EEES5_S9_S9_SD_SG_SG_bSJ_SN_EE6__callIiJEJLm0ELm1ELm2ELm3ELm4ELm5ELm6ELm7ELm8EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @ 0x7fc3c38635d3 std::_Bind<>::operator()<>()
> @ 0x7fc3c3862682 std::_Function_handler<>::_M_invoke()
> @ 0x48a4b8 std::function<>::operator()()
> @ 0x7fc3c247de67 process::defaultClone()
> @ 0x7fc3c3861c40 std::_Function_handler<>::_M_invoke()
> @ 0x7fc3c3861411 std::function<>::operator()()
> @ 0x7fc3c385f8f5 process::internal::cloneChild()
> @ 0x7fc3c385d50e process::subprocess()
> @ 0x7fc3c30d318f
> mesos::internal::slave::NetworkCniIsolatorProcess::__isolate()
> @ 0x7fc3c30cf909
> mesos::internal::slave::NetworkCniIsolatorProcess::isolate()
> @ 0x7fc3c2d4db56
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave20MesosIsolatorProcessERKNS2_11ContainerIDEiS6_iEENS_6FutureIT_EERKNS_3PIDIT0_EEMSD_FSB_T1_T2_ET3_T4_ENKUlPNS_11ProcessBaseEE_clESO_
> @ 0x7fc3c2d50eb8
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave20MesosIsolatorProcessERKNS6_11ContainerIDEiSA_iEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7fc3c380a1dd std::function<>::operator()()
> @ 0x7fc3c37eb094 process::ProcessBase::visit()
> @ 0x7fc3c37f3b26 process::DispatchEvent::visit()
> @ 0x7fc3c2244a08 process::ProcessBase::serve()
> @ 0x7fc3c37e6f50 process::ProcessManager::resume()
> @ 0x7fc3c37e3a78
> _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
> @ 0x7fc3c37f3148
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7fc3c37f309e
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
> @ 0x7fc3c37f302e
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7fc3bdc97c80 (unknown)
> @ 0x7fc3bd7b36ba start_thread
> @ 0x7fc3bd4e982d (unknown)
> {code}
> Note that this does not crash hard so the agent stays running.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)