[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15934001#comment-15934001
 ] 

haosdent commented on MESOS-7210:
---------------------------------

Thanks a lot  [~sielaq] [~alexr]'s help. Let me try to fix this.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-7210
>                 URL: https://issues.apache.org/jira/browse/MESOS-7210
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 1.1.0
>         Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>            Reporter: Wojciech Sielski
>            Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:        host
>   privileged: true
>   pid:        host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>    "type": "DOCKER",
>    "docker": {
>      "image": "python:alpine",
>      "network": "BRIDGE",
>      "portMappings": [
>         { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>      ]
>    }
>  },
>  "env": {
>    "SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>    {
>      "path": "/",
>      "portIndex": 0,
>      "protocol": "MESOS_HTTP",
>      "gracePeriodSeconds": 30,
>      "intervalSeconds": 10,
>      "timeoutSeconds": 30,
>      "maxConsecutiveFailures": 3
>    }
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.844293    35 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
>     @     0x7f51770b0c1d  google::LogMessage::Fail()
>     @     0x7f51770b29d0  google::LogMessage::SendToLog()
>     @     0x7f51770b0803  google::LogMessage::Flush()
>     @     0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
>     @     0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
>     @     0x7f517648374b  std::_Function_handler<>::_M_invoke()
>     @     0x7f5177068167  process::internal::cloneChild()
>     @     0x7f5177065c32  process::subprocess()
>     @     0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
>     @     0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
>     @     0x7f517701f38c  process::ProcessBase::visit()
>     @     0x7f517702c8b3  process::ProcessManager::resume()
>     @     0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
>     @     0x7f51754ddc80  (unknown)
>     @     0x7f5174cf06ba  start_thread
>     @     0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986     9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to