[
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Avinash Sridharan reassigned MESOS-7210:
----------------------------------------
Assignee: Gastón Kleiman
> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image (
> pid namespace mismatch )
> ---------------------------------------------------------------------------------------------------
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
> Issue Type: Bug
> Components: docker
> Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers spawned by marathon 1.4.1
> Reporter: Wojciech Sielski
> Assignee: Gastón Kleiman
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos --containerizers=docker,mesos
> --executor_registration_timeout=5mins --hostname=standalone --ip=0.0.0.0
> --docker_stop_timeout=5secs --gc_delay=1days
> --docker_socket=/var/run/docker.sock --no-systemd_enable_support
> --work_dir=/tmp/mesos --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
> net: host
> privileged: true
> pid: host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
> "id": "python-example-stable",
> "cmd": "python3 -m http.server 8080",
> "mem": 16,
> "cpus": 0.1,
> "instances": 2,
> "container": {
> "type": "DOCKER",
> "docker": {
> "image": "python:alpine",
> "network": "BRIDGE",
> "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
> ]
> }
> },
> "env": {
> "SERVICE_NAME" : "python"
> },
> "healthChecks": [
> {
> "path": "/",
> "portIndex": 0,
> "protocol": "MESOS_HTTP",
> "gracePeriodSeconds": 30,
> "intervalSeconds": 10,
> "timeoutSeconds": 30,
> "maxConsecutiveFailures": 3
> }
> ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.844293 35 health_checker.cpp:94] Failed to enter the net
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d google::LogMessage::Fail()
> @ 0x7f51770b29d0 google::LogMessage::SendToLog()
> @ 0x7f51770b0803 google::LogMessage::Flush()
> @ 0x7f51770b33f9 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167 process::internal::cloneChild()
> @ 0x7f5177065c32 process::subprocess()
> @ 0x7f5176481a9d
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c process::ProcessBase::visit()
> @ 0x7f517702c8b3 process::ProcessManager::resume()
> @ 0x7f517702fb77
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80 (unknown)
> @ 0x7f5174cf06ba start_thread
> @ 0x7f5174a2682d (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is
> not using "pid host" option same as mother container was started, but has his
> own PID namespace (so it doesn't matter if mother container was started with
> "pid host" or not it will never be able to find PID)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)