[ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-6184:
------------------------------
    Comment: was deleted

(was: i have rebase the patch to 1.2.0 branch codebase. and testing it, it 
always get coredump file.

```
I0328 11:48:12.922181    48 exec.cpp:162] Version: 1.2.0
I0328 11:48:12.929252    54 exec.cpp:237] Executor registered on agent 
a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4
I0328 11:48:12.931640    54 docker.cpp:850] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file 
/tmp/gvqGyb -v 
/data/mesos/slaves/a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a274-c020afba6bce-0000/executors/0-hc-xychu-datamanmesos-2f3b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox
 --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest 
--label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu 
--label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp 
--name 
mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb
 nginx
I0328 11:48:16.145714    53 health_checker.cpp:196] Ignoring failure as health 
check still in grace period
W0328 11:48:26.289958    49 health_checker.cpp:202] Health check failed 1 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are 
using GNU date ***
PC: @     0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack 
trace: ***
    @     0x7f26c0703100 (unknown)
    @     0x7f26bfb485f7 __GI_raise
    @     0x7f26bfb49ce8 __GI_abort
    @     0x7f26c315778e _Abort()
    @     0x7f26c31577cc _Abort()
    @     0x7f26c237a4b6 process::internal::childMain()
    @     0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
    @     0x7f26c2379e53 process::internal::defaultClone()
    @     0x7f26c237b951 process::internal::cloneChild()
    @     0x7f26c237954f process::subprocess()
    @     0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
    @     0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
    @     0x7f26c2331389 process::ProcessManager::resume()
    @     0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f26c04a1220 (unknown)
    @     0x7f26c06fbdc5 start_thread
    @     0x7f26bfc0928d __clone
W0328 11:48:36.340055    55 health_checker.cpp:202] Health check failed 2 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672916 (unix time) try "date -d @1490672916" if you are 
using GNU date ***
PC: @     0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from PID 75; stack 
trace: ***
    @     0x7f26c0703100 (unknown)
    @     0x7f26bfb485f7 __GI_raise
    @     0x7f26bfb49ce8 __GI_abort
    @     0x7f26c315778e _Abort()
    @     0x7f26c31577cc _Abort()
    @     0x7f26c237a4b6 process::internal::childMain()
    @     0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
    @     0x7f26c2379e53 process::internal::defaultClone()
    @     0x7f26c237b951 process::internal::cloneChild()
    @     0x7f26c237954f process::subprocess()
    @     0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
    @     0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
    @     0x7f26c2331389 process::ProcessManager::resume()
    @     0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f26c04a1220 (unknown)
    @     0x7f26c06fbdc5 start_thread
    @     0x7f26bfc0928d __clone
W0328 11:48:46.386533    49 health_checker.cpp:202] Health check failed 3 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672926 (unix time) try "date -d @1490672926" if you are 
using GNU date ***
PC: @     0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4c) received by PID 76 (TID 0x7f26ba152700) from PID 76; stack 
trace: ***
    @     0x7f26c0703100 (unknown)
    @     0x7f26bfb485f7 __GI_raise
    @     0x7f26bfb49ce8 __GI_abort
    @     0x7f26c315778e _Abort()
    @     0x7f26c31577cc _Abort()
    @     0x7f26c237a4b6 process::internal::childMain()
    @     0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
    @     0x7f26c2379e53 process::internal::defaultClone()
    @     0x7f26c237b951 process::internal::cloneChild()
    @     0x7f26c237954f process::subprocess()
    @     0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
    @     0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
    @     0x7f26c2331389 process::ProcessManager::resume()
    @     0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f26c04a1220 (unknown)
    @     0x7f26c06fbdc5 start_thread
    @     0x7f26bfc0928d __clone
W0328 11:48:56.531623    53 health_checker.cpp:202] Health check failed 4 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672936 (unix time) try "date -d @1490672936" if you are 
using GNU date ***
PC: @     0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4d) received by PID 77 (TID 0x7f26b814e700) from PID 77; stack 
trace: ***
    @     0x7f26c0703100 (unknown)
    @     0x7f26bfb485f7 __GI_raise
    @     0x7f26bfb49ce8 __GI_abort
    @     0x7f26c315778e _Abort()
    @     0x7f26c31577cc _Abort()
    @     0x7f26c237a4b6 process::internal::childMain()
    @     0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
    @     0x7f26c2379e53 process::internal::defaultClone()
    @     0x7f26c237b951 process::internal::cloneChild()
    @     0x7f26c237954f process::subprocess()
    @     0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
    @     0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
    @     0x7f26c2331389 process::ProcessManager::resume()
    @     0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f26c04a1220 (unknown)
    @     0x7f26c06fbdc5 start_thread
    @     0x7f26bfc0928d __clone
W0328 11:49:06.678515    50 health_checker.cpp:202] Health check failed 5 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672946 (unix time) try "date -d @1490672946" if you are 
using GNU date ***
PC: @     0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4e) received by PID 78 (TID 0x7f26b9951700) from PID 78; stack 
trace: ***
    @     0x7f26c0703100 (unknown)
    @     0x7f26bfb485f7 __GI_raise
    @     0x7f26bfb49ce8 __GI_abort
    @     0x7f26c315778e _Abort()
    @     0x7f26c31577cc _Abort()
    @     0x7f26c237a4b6 process::internal::childMain()
    @     0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
    @     0x7f26c2379e53 process::internal::defaultClone()
    @     0x7f26c237b951 process::internal::cloneChild()
    @     0x7f26c237954f process::subprocess()
    @     0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
    @     0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
    @     0x7f26c2331389 process::ProcessManager::resume()
    @     0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f26c04a1220 (unknown)
    @     0x7f26c06fbdc5 start_thread
    @     0x7f26bfc0928d __clone
I0328 11:49:06.678840    50 health_checker.cpp:130] Health checking stopped
I0328 11:49:06.880620    49 health_checker.cpp:130] Health checking stopped
```)

> Health checks should use a general mechanism to enter namespaces of the task.
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-6184
>                 URL: https://issues.apache.org/jira/browse/MESOS-6184
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: haosdent
>            Assignee: haosdent
>            Priority: Critical
>              Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
>     if (taskPid.isSome()) {
>       foreach (const string& ns, namespaces) {
>         Try<Nothing> setns = ns::setns(taskPid.get(), ns);
>         if (setns.isError()) {
>           ...
>         }
>       }
>     }
>     return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to