[ 
https://issues.apache.org/jira/browse/MESOS-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-8004:
--------------------------------
    Comment: was deleted

(was: [~highfly], what's your Mesos version?)

> Failed to kill all processes in the container due to cgroup freeze failure
> --------------------------------------------------------------------------
>
>                 Key: MESOS-8004
>                 URL: https://issues.apache.org/jira/browse/MESOS-8004
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, containerization
>    Affects Versions: 1.2.1
>         Environment: CentOS Linux release 7.2.1511 (Core) 
> 3.10.0-327.36.3.el7.x86_64
>            Reporter: Haiwei Zhou
>              Labels: launcher
>
> When using Mesos unified container, executor can not be destroyed because 
> cgroup freeze operation failed. The logs from agent show that launcher tries 
> to freeze cgroup several times, then timeout occurs. However, the content of 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8/freezer.state
>  is "FROZEN".
> {quote}
> I0921 18:00:58.339440  3493 containerizer.cpp:2465] Container 
> e2778ccd-c7e5-4289-b382-e05f063200d8 has exited
> I0921 18:00:58.339519  3493 containerizer.cpp:2102] Destroying container 
> e2778ccd-c7e5-4289-b382-e05f063200d8 in RUNNING state
> I0921 18:00:58.339645  3484 linux_launcher.cpp:505] Asked to destroy 
> container e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:00:58.340553  3484 linux_launcher.cpp:548] Using freezer to destroy 
> cgroup mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:00:58.342226  3493 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:00.042708  3475 slave.cpp:5155] Killing executor 
> '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework 
> 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 at executor(1)@172.29.0.18:40108
> I0921 18:01:02.009097  3483 process.cpp:3704] Handling HTTP event for process 
> 'slave(1)' with path: '/slave(1)/containers'
> W0921 18:01:02.011672  3491 containerizer.cpp:2055] Skipping status for 
> container e2778ccd-c7e5-4289-b382-e05f063200d8 because: Container does not 
> exist
> I0921 18:01:04.269701  3487 slave.cpp:5732] Querying resource estimator for 
> oversubscribable resources
> I0921 18:01:04.269775  3487 slave.cpp:5266] Current disk usage 0.11%. Max 
> allowed age: 6.292478769607581days
> I0921 18:01:04.270349  3506 slave.cpp:5746] Received oversubscribable 
> resources {} from the resource estimator
> I0921 18:01:08.300772  3474 slave.cpp:4346] Received ping from 
> slave-observer(30)@10.16.85.66:5050
> I0921 18:01:08.345176  3517 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:08.347452  3517 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 
> 2.183168ms
> I0921 18:01:08.347561  3517 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> E0921 18:01:15.192441  3524 perf_event.cpp:176] Perf sample of 10secs failed 
> to complete within 12secs; sampling will be halted
> E0921 18:01:15.192819  3489 perf_event.cpp:199] Failed to get the perf 
> sample: timeout
> I0921 18:01:18.350342  3488 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:18.352532  3488 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 
> 2.121984ms
> I0921 18:01:18.352646  3481 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:19.301443  3520 slave.cpp:5732] Querying resource estimator for 
> oversubscribable resources
> I0921 18:01:19.301566  3501 slave.cpp:5746] Received oversubscribable 
> resources {} from the resource estimator
> I0921 18:01:23.307291  3518 slave.cpp:4346] Received ping from 
> slave-observer(30)@10.16.85.66:5050
> I0921 18:01:28.121094  3491 process.cpp:3704] Handling HTTP event for process 
> 'metrics' with path: '/metrics/snapshot'
> I0921 18:01:28.355551  3493 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:28.357792  3493 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 
> 2.177024ms
> I0921 18:01:28.357890  3493 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:34.302625  3503 slave.cpp:5732] Querying resource estimator for 
> oversubscribable resources
> I0921 18:01:34.302738  3483 slave.cpp:5746] Received oversubscribable 
> resources {} from the resource estimator
> I0921 18:01:38.315979  3505 slave.cpp:4346] Received ping from 
> slave-observer(30)@10.16.85.66:5050
> I0921 18:01:38.360709  3511 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:38.362891  3511 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 
> 2.12608ms
> I0921 18:01:38.362993  3475 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:48.366251  3492 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> I0921 18:01:48.368404  3496 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 
> 2.080256ms
> I0921 18:01:48.368501  3496 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
> E0921 18:01:58.342779  3478 slave.cpp:4746] Termination of executor 
> '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework 
> 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 failed: Failed to kill all 
> processes in the container: Timed out after 1mins
> I0921 18:01:58.342830  3478 slave.cpp:4868] Cleaning up executor 
> '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework 
> 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 at executor(1)@172.29.0.18:40108
> I0921 18:01:58.364516  3475 gc.cpp:55] Scheduling 
> '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53/runs/e2778ccd-c7e5-4289-b382-e05f063200d8'
>  for gc 6.99999578195556days in the future
> I0921 18:01:58.364591  3475 gc.cpp:55] Scheduling 
> '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53'
>  for gc 6.9999957811437days in the future
> I0921 18:01:58.364604  3478 slave.cpp:4956] Cleaning up framework 
> 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
> I0921 18:01:58.364615  3475 gc.cpp:55] Scheduling 
> '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53/runs/e2778ccd-c7e5-4289-b382-e05f063200d8'
>  for gc 6.99999578062519days in the future
> I0921 18:01:58.364670  3475 gc.cpp:55] Scheduling 
> '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53'
>  for gc 6.99999578024296days in the future
> I0921 18:01:58.364683  3479 status_update_manager.cpp:285] Closing status 
> update streams for framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
> I0921 18:01:58.364702  3475 gc.cpp:55] Scheduling 
> '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110'
>  for gc 6.9999957791437days in the future
> I0921 18:01:58.364725  3479 status_update_manager.cpp:531] Cleaning up status 
> update stream for task 47eb9350-9ab4-41f8-a5cd-39e855532b53 of framework 
> 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
> I0921 18:01:58.364740  3475 gc.cpp:55] Scheduling 
> '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110'
>  for gc 6.99999577881778days in the future
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to