[ https://issues.apache.org/jira/browse/MESOS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243758#comment-16243758 ]
Armand Grillet commented on MESOS-7361: --------------------------------------- This issue is due to the fact that if the check is a COMMAND check, the checker will delegate the execution of the check to the Mesos agent via the {{LaunchNestedContainerSession}} API call: https://github.com/apache/mesos/blob/b13c4c3683fd6bad702a7fb9e24cfc3414b921da/src/checks/checker_process.cpp#L555? If we use the universal containerizer, this means following that workflow: https://docs.google.com/document/d/1FtcyQkDfGp-bPHTW4pUoqQCgVlPde936bo-IIENO_ho/edit# We do not differentiate regular nested containers and health checks (that are periodically started nested containers), this results in 26 lines of logs for one command health check on a Mesos agent due to its nested container state: {code} Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118629 4727 containerizer.cpp:1056] Trying to chown '/var/lib/mesos/slave/slaves/d71d7fe2-348d-4091-9c26-445d103b3cf1-S1/frameworks/d71d7fe2-348d-4091-9c26-445d103b3cf1-0001/executors/instance-pod-with-healthcheck.8c197d66-b347-11e7-a146-42c7ec5fa9b9/runs/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf' to user 'root' Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118783 4727 containerizer.cpp:1085] Starting container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.119021 4727 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from PROVISIONING to PREPARING Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.144028 4721 switchboard.cpp:575] Created I/O switchboard server (pid: 6669) listening on socket file '/tmp/mesos-io-switchboard-2ee398cd-f448-4bb9-9d80-e57069b9c199' for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.171833 4723 linux_launcher.cpp:429] Launching nested container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf and cloning with namespaces Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.203270 4725 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from PREPARING to ISOLATING Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206295 4725 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from ISOLATING to FETCHING Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206835 4729 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from FETCHING to RUNNING Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.241371 4728 http.cpp:2854] Received EOF attach response for 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242112 4725 containerizer.cpp:2168] Destroying container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf in RUNNING state Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.242159 4721 http.cpp:2869] Launch nested container session connection for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf closed Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242184 4725 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from RUNNING to DESTROYING Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242352 4729 linux_launcher.cpp:505] Asked to destroy container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.243599 4729 linux_launcher.cpp:548] Using freezer to destroy cgroup mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.246537 4724 cgroups.cpp:3055] Freezing cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.250340 4722 cgroups.cpp:1413] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf after 3.11808ms Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.253630 4729 cgroups.cpp:3073] Thawing cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.256338 4722 cgroups.cpp:1442] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf after 2.663936ms Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336752 4729 switchboard.cpp:888] I/O switchboard server process for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf has terminated (status=0) Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336876 4725 containerizer.cpp:2614] Container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf has exited Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338035 4724 container_assigner.cpp:101] Unregistering container_id[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf" parent { value: "ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: "61a37663-ebbe-4e7e-8a0d-9dd7918b2348" } }]. Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.338174 4843 container_assigner_strategy.cpp:192] No ephemeral-port reader had been assigned to container[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf" parent { value: "ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: "61a37663-ebbe-4e7e-8a0d-9dd7918b2348" } }], cannot unregister Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338228 4843 container_state_cache_impl.cpp:145] Removing container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf] Oct 30 14:43:15 mesos-agent[4718]: E1017 14:43:15.338282 4843 container_state_cache_impl.cpp:148] Failed to remove container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]: No such file or directory Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338367 4725 containerizer.cpp:2452] Checkpointing termination state to nested container's runtime directory '/var/run/mesos/containers/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf/termination' {code} This is excessive as, for a cluster running 100 tasks each having a health check running every 30 seconds, this would mean around 5200 lines of logs per minute. > Command checks via agent pollute agent logs. > -------------------------------------------- > > Key: MESOS-7361 > URL: https://issues.apache.org/jira/browse/MESOS-7361 > Project: Mesos > Issue Type: Improvement > Components: agent > Reporter: Alexander Rukletsov > Assignee: Armand Grillet > Labels: check, health-check, mesosphere > > Command checks via agent leverage debug container API of the agent to start > checks. Each such invocation triggers a bunch of logs on the agent, because > the API was not originally designed with periodic invocations in mind. We > should find a way to avoid excessive logging on the agent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)