[ 
https://issues.apache.org/jira/browse/MESOS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243758#comment-16243758
 ] 

Armand Grillet commented on MESOS-7361:
---------------------------------------

This issue is due to the fact that if the check is a COMMAND check, the checker 
will delegate the execution of the check to the Mesos agent via the 
{{LaunchNestedContainerSession}} API call: 
https://github.com/apache/mesos/blob/b13c4c3683fd6bad702a7fb9e24cfc3414b921da/src/checks/checker_process.cpp#L555?
 If we use the universal containerizer, this means following that workflow: 
https://docs.google.com/document/d/1FtcyQkDfGp-bPHTW4pUoqQCgVlPde936bo-IIENO_ho/edit#
 

We do not differentiate regular nested containers and health checks (that are 
periodically started nested containers), this results in 26 lines of logs for 
one command health check on a Mesos agent due to its nested container state:

{code}
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118629  4727 
containerizer.cpp:1056] Trying to chown 
'/var/lib/mesos/slave/slaves/d71d7fe2-348d-4091-9c26-445d103b3cf1-S1/frameworks/d71d7fe2-348d-4091-9c26-445d103b3cf1-0001/executors/instance-pod-with-healthcheck.8c197d66-b347-11e7-a146-42c7ec5fa9b9/runs/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf'
 to user 'root'
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118783  4727 
containerizer.cpp:1085] Starting container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.119021  4727 
containerizer.cpp:2714] Transitioning the state of container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 from PROVISIONING to PREPARING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.144028  4721 
switchboard.cpp:575] Created I/O switchboard server (pid: 6669) listening on 
socket file '/tmp/mesos-io-switchboard-2ee398cd-f448-4bb9-9d80-e57069b9c199' 
for container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.171833  4723 
linux_launcher.cpp:429] Launching nested container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 and cloning with namespaces
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.203270  4725 
containerizer.cpp:2714] Transitioning the state of container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 from PREPARING to ISOLATING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206295  4725 
containerizer.cpp:2714] Transitioning the state of container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 from ISOLATING to FETCHING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206835  4729 
containerizer.cpp:2714] Transitioning the state of container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 from FETCHING to RUNNING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.241371  4728 http.cpp:2854] 
Received EOF attach response for 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242112  4725 
containerizer.cpp:2168] Destroying container
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 in RUNNING state
Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.242159  4721 http.cpp:2869] 
Launch nested container session connection for container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 closed
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242184  4725 
containerizer.cpp:2714] Transitioning the state of container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 from RUNNING to DESTROYING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242352  4729 
linux_launcher.cpp:505] Asked to destroy container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.243599  4729 
linux_launcher.cpp:548] Using freezer to destroy cgroup 
mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.246537  4724 
cgroups.cpp:3055] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.250340  4722 
cgroups.cpp:1413] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 after 3.11808ms
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.253630  4729 
cgroups.cpp:3073] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.256338  4722 
cgroups.cpp:1442] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 after 2.663936ms
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336752  4729 
switchboard.cpp:888] I/O switchboard server process for container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 has terminated (status=0)
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336876  4725 
containerizer.cpp:2614] Container 
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
 has exited
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338035  4724 
container_assigner.cpp:101] Unregistering container_id[value: 
"check-5517dae6-9d41-492e-b5ae-43bacc8692bf" parent { value: 
"ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: 
"61a37663-ebbe-4e7e-8a0d-9dd7918b2348" } }].
Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.338174  4843 
container_assigner_strategy.cpp:192] No ephemeral-port reader had been assigned 
to container[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf" parent { 
value: "ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: 
"61a37663-ebbe-4e7e-8a0d-9dd7918b2348" } }], cannot unregister
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338228  4843 
container_state_cache_impl.cpp:145] Removing container 
file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]
Oct 30 14:43:15 mesos-agent[4718]: E1017 14:43:15.338282  4843 
container_state_cache_impl.cpp:148] Failed to remove container 
file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]:
 No such file or directory
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338367  4725 
containerizer.cpp:2452] Checkpointing termination state to nested container's 
runtime directory 
'/var/run/mesos/containers/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf/termination'
{code}

This is excessive as, for a cluster running 100 tasks each having a health 
check running every 30 seconds, this would mean around 5200 lines of logs per 
minute.


> Command checks via agent pollute agent logs.
> --------------------------------------------
>
>                 Key: MESOS-7361
>                 URL: https://issues.apache.org/jira/browse/MESOS-7361
>             Project: Mesos
>          Issue Type: Improvement
>          Components: agent
>            Reporter: Alexander Rukletsov
>            Assignee: Armand Grillet
>              Labels: check, health-check, mesosphere
>
> Command checks via agent leverage debug container API of the agent to start 
> checks. Each such invocation triggers a bunch of logs on the agent, because 
> the API was not originally designed with periodic invocations in mind. We 
> should find a way to avoid excessive logging on the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to