[ https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deshi Xiao updated MESOS-6184: ------------------------------ Comment: was deleted (was: i have rebase the patch to 1.2.0 branch codebase. and testing it, it always get coredump file. ``` I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a274-c020afba6bce-0000/executors/0-hc-xychu-datamanmesos-2f3b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest --label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu --label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp --name mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb nginx I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as health check still in grace period W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed 1 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist *** Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise *** SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess::performSingleCheck() @ 0x7f26c2331389 process::ProcessManager::resume() @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f26c04a1220 (unknown) @ 0x7f26c06fbdc5 start_thread @ 0x7f26bfc0928d __clone W0328 11:48:36.340055 55 health_checker.cpp:202] Health check failed 2 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist *** Aborted at 1490672916 (unix time) try "date -d @1490672916" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise *** SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from PID 75; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess::performSingleCheck() @ 0x7f26c2331389 process::ProcessManager::resume() @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f26c04a1220 (unknown) @ 0x7f26c06fbdc5 start_thread @ 0x7f26bfc0928d __clone W0328 11:48:46.386533 49 health_checker.cpp:202] Health check failed 3 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist *** Aborted at 1490672926 (unix time) try "date -d @1490672926" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise *** SIGABRT (@0x4c) received by PID 76 (TID 0x7f26ba152700) from PID 76; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess::performSingleCheck() @ 0x7f26c2331389 process::ProcessManager::resume() @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f26c04a1220 (unknown) @ 0x7f26c06fbdc5 start_thread @ 0x7f26bfc0928d __clone W0328 11:48:56.531623 53 health_checker.cpp:202] Health check failed 4 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist *** Aborted at 1490672936 (unix time) try "date -d @1490672936" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise *** SIGABRT (@0x4d) received by PID 77 (TID 0x7f26b814e700) from PID 77; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess::performSingleCheck() @ 0x7f26c2331389 process::ProcessManager::resume() @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f26c04a1220 (unknown) @ 0x7f26c06fbdc5 start_thread @ 0x7f26bfc0928d __clone W0328 11:49:06.678515 50 health_checker.cpp:202] Health check failed 5 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist *** Aborted at 1490672946 (unix time) try "date -d @1490672946" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise *** SIGABRT (@0x4e) received by PID 78 (TID 0x7f26b9951700) from PID 78; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess::performSingleCheck() @ 0x7f26c2331389 process::ProcessManager::resume() @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f26c04a1220 (unknown) @ 0x7f26c06fbdc5 start_thread @ 0x7f26bfc0928d __clone I0328 11:49:06.678840 50 health_checker.cpp:130] Health checking stopped I0328 11:49:06.880620 49 health_checker.cpp:130] Health checking stopped ```) > Health checks should use a general mechanism to enter namespaces of the task. > ----------------------------------------------------------------------------- > > Key: MESOS-6184 > URL: https://issues.apache.org/jira/browse/MESOS-6184 > Project: Mesos > Issue Type: Improvement > Reporter: haosdent > Assignee: haosdent > Priority: Critical > Labels: health-check, mesosphere > > To perform health checks for tasks, we need to enter the corresponding > namespaces of the container. For now health check use custom clone to > implement this > {code} > return process::defaultClone([=]() -> int { > if (taskPid.isSome()) { > foreach (const string& ns, namespaces) { > Try<Nothing> setns = ns::setns(taskPid.get(), ns); > if (setns.isError()) { > ... > } > } > } > return func(); > }); > {code} > After the childHooks patches merged, we could change the health check to use > childHooks to call {{setns}} and make {{process::defaultClone}} private > again. -- This message was sent by Atlassian JIRA (v6.3.15#6346)