[ https://issues.apache.org/jira/browse/MESOS-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gilbert Song updated MESOS-8004: -------------------------------- Comment: was deleted (was: [~highfly], what's your Mesos version?) > Failed to kill all processes in the container due to cgroup freeze failure > -------------------------------------------------------------------------- > > Key: MESOS-8004 > URL: https://issues.apache.org/jira/browse/MESOS-8004 > Project: Mesos > Issue Type: Bug > Components: agent, containerization > Affects Versions: 1.2.1 > Environment: CentOS Linux release 7.2.1511 (Core) > 3.10.0-327.36.3.el7.x86_64 > Reporter: Haiwei Zhou > Labels: launcher > > When using Mesos unified container, executor can not be destroyed because > cgroup freeze operation failed. The logs from agent show that launcher tries > to freeze cgroup several times, then timeout occurs. However, the content of > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8/freezer.state > is "FROZEN". > {quote} > I0921 18:00:58.339440 3493 containerizer.cpp:2465] Container > e2778ccd-c7e5-4289-b382-e05f063200d8 has exited > I0921 18:00:58.339519 3493 containerizer.cpp:2102] Destroying container > e2778ccd-c7e5-4289-b382-e05f063200d8 in RUNNING state > I0921 18:00:58.339645 3484 linux_launcher.cpp:505] Asked to destroy > container e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:00:58.340553 3484 linux_launcher.cpp:548] Using freezer to destroy > cgroup mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:00:58.342226 3493 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:00.042708 3475 slave.cpp:5155] Killing executor > '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework > 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 at executor(1)@172.29.0.18:40108 > I0921 18:01:02.009097 3483 process.cpp:3704] Handling HTTP event for process > 'slave(1)' with path: '/slave(1)/containers' > W0921 18:01:02.011672 3491 containerizer.cpp:2055] Skipping status for > container e2778ccd-c7e5-4289-b382-e05f063200d8 because: Container does not > exist > I0921 18:01:04.269701 3487 slave.cpp:5732] Querying resource estimator for > oversubscribable resources > I0921 18:01:04.269775 3487 slave.cpp:5266] Current disk usage 0.11%. Max > allowed age: 6.292478769607581days > I0921 18:01:04.270349 3506 slave.cpp:5746] Received oversubscribable > resources {} from the resource estimator > I0921 18:01:08.300772 3474 slave.cpp:4346] Received ping from > slave-observer(30)@10.16.85.66:5050 > I0921 18:01:08.345176 3517 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:08.347452 3517 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after > 2.183168ms > I0921 18:01:08.347561 3517 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > E0921 18:01:15.192441 3524 perf_event.cpp:176] Perf sample of 10secs failed > to complete within 12secs; sampling will be halted > E0921 18:01:15.192819 3489 perf_event.cpp:199] Failed to get the perf > sample: timeout > I0921 18:01:18.350342 3488 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:18.352532 3488 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after > 2.121984ms > I0921 18:01:18.352646 3481 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:19.301443 3520 slave.cpp:5732] Querying resource estimator for > oversubscribable resources > I0921 18:01:19.301566 3501 slave.cpp:5746] Received oversubscribable > resources {} from the resource estimator > I0921 18:01:23.307291 3518 slave.cpp:4346] Received ping from > slave-observer(30)@10.16.85.66:5050 > I0921 18:01:28.121094 3491 process.cpp:3704] Handling HTTP event for process > 'metrics' with path: '/metrics/snapshot' > I0921 18:01:28.355551 3493 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:28.357792 3493 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after > 2.177024ms > I0921 18:01:28.357890 3493 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:34.302625 3503 slave.cpp:5732] Querying resource estimator for > oversubscribable resources > I0921 18:01:34.302738 3483 slave.cpp:5746] Received oversubscribable > resources {} from the resource estimator > I0921 18:01:38.315979 3505 slave.cpp:4346] Received ping from > slave-observer(30)@10.16.85.66:5050 > I0921 18:01:38.360709 3511 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:38.362891 3511 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after > 2.12608ms > I0921 18:01:38.362993 3475 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:48.366251 3492 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > I0921 18:01:48.368404 3496 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after > 2.080256ms > I0921 18:01:48.368501 3496 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 > E0921 18:01:58.342779 3478 slave.cpp:4746] Termination of executor > '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework > 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 failed: Failed to kill all > processes in the container: Timed out after 1mins > I0921 18:01:58.342830 3478 slave.cpp:4868] Cleaning up executor > '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework > 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 at executor(1)@172.29.0.18:40108 > I0921 18:01:58.364516 3475 gc.cpp:55] Scheduling > '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53/runs/e2778ccd-c7e5-4289-b382-e05f063200d8' > for gc 6.99999578195556days in the future > I0921 18:01:58.364591 3475 gc.cpp:55] Scheduling > '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53' > for gc 6.9999957811437days in the future > I0921 18:01:58.364604 3478 slave.cpp:4956] Cleaning up framework > 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 > I0921 18:01:58.364615 3475 gc.cpp:55] Scheduling > '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53/runs/e2778ccd-c7e5-4289-b382-e05f063200d8' > for gc 6.99999578062519days in the future > I0921 18:01:58.364670 3475 gc.cpp:55] Scheduling > '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53' > for gc 6.99999578024296days in the future > I0921 18:01:58.364683 3479 status_update_manager.cpp:285] Closing status > update streams for framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 > I0921 18:01:58.364702 3475 gc.cpp:55] Scheduling > '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110' > for gc 6.9999957791437days in the future > I0921 18:01:58.364725 3479 status_update_manager.cpp:531] Cleaning up status > update stream for task 47eb9350-9ab4-41f8-a5cd-39e855532b53 of framework > 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 > I0921 18:01:58.364740 3475 gc.cpp:55] Scheduling > '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110' > for gc 6.99999577881778days in the future > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)