[ 
https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943442#comment-16943442
 ] 

Qian Zhang commented on MESOS-9966:
-----------------------------------

Master:

commit 68c8b12e96a85f60578712152467d851eff4c643
Author: Qian Zhang 
Date: Wed Sep 18 16:34:05 2019 +0800

Gc'ed nested container sandbox only if we have root container sandbox.
 
 Review: [https://reviews.apache.org/r/71501]

commit a6445fa92e5b7e6ac3b9c9ef634b442d3310090c
Author: Qian Zhang 
Date: Thu Sep 19 16:33:46 2019 +0800

Added the test `GarbageCollectorIntegrationTest.ROOT_OrphanContainer`.
 
 Review: [https://reviews.apache.org/r/71518]

 

1.9.x:

commit 7e514dcbb96d951fae02484068286ceae8d34c4d
Author: Qian Zhang <zhq527...@gmail.com>
Date: Wed Sep 18 16:34:05 2019 +0800

Gc'ed nested container sandbox only if we have root container sandbox.
 
 Review: [https://reviews.apache.org/r/71501]

 

1.8.x:

commit 4941bf445902d519dda4943518016f269482a0b7
Author: Qian Zhang <zhq527...@gmail.com>
Date: Wed Sep 18 16:34:05 2019 +0800

Gc'ed nested container sandbox only if we have root container sandbox.
 
 Review: [https://reviews.apache.org/r/71501]

 

1.7.x:

commit aff4af5f5ef23edcc3025c410d41e297e2127ce0
Author: Qian Zhang <zhq527...@gmail.com>
Date: Wed Sep 18 16:34:05 2019 +0800

Gc'ed nested container sandbox only if we have root container sandbox.
 
 Review: [https://reviews.apache.org/r/71501]

> Agent crashes when trying to destroy orphaned nested container if root 
> container is orphaned as well
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9966
>                 URL: https://issues.apache.org/jira/browse/MESOS-9966
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.7.3
>            Reporter: Jan Schlicht
>            Assignee: Qian Zhang
>            Priority: Critical
>
> Noticed an agent crash-looping when trying to recover. It recognized a 
> container and its nested container as orphaned. When trying to destroy the 
> nested container, the agent crashes. Probably when trying to [get the sandbox 
> path of the root 
> container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966].
> {noformat}
> 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] 
> Recovering Linux launcher
> 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] 
> Recovered container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] 
> Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] 
> Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] 
> Recovered container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] 
> Recovering isolators
> 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started 
> listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started 
> listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386694 89921 containerizer.cpp:1131] 
> Recovering provisioner
> 2019-09-09 05:04:26: I0909 05:04:26.388226 90010 metadata_manager.cpp:286] 
> Successfully loaded 64 Docker images
> 2019-09-09 05:04:26: I0909 05:04:26.388420 89932 provisioner.cpp:494] 
> Provisioner recovery complete
> 2019-09-09 05:04:26: I0909 05:04:26.388530 90003 containerizer.cpp:1203] 
> Cleaning up orphan container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.388562 90003 containerizer.cpp:2520] 
> Destroying container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 in 
> RUNNING state
> 2019-09-09 05:04:26: I0909 05:04:26.388576 90003 containerizer.cpp:3187] 
> Transitioning the state of container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 
> from RUNNING to DESTROYING
> 2019-09-09 05:04:26: I0909 05:04:26.388640 90003 containerizer.cpp:1203] 
> Cleaning up orphan container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.388650 90003 containerizer.cpp:2520] 
> Destroying container a127917b-96fe-4100-b73d-5f876ce9ffc1 in RUNNING state
> 2019-09-09 05:04:26: I0909 05:04:26.388659 90003 containerizer.cpp:3187] 
> Transitioning the state of container a127917b-96fe-4100-b73d-5f876ce9ffc1 
> from RUNNING to DESTROYING
> 2019-09-09 05:04:26: I0909 05:04:26.388689 90003 containerizer.cpp:1203] 
> Cleaning up orphan container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.388698 90003 containerizer.cpp:2520] 
> Destroying container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 in 
> RUNNING state
> 2019-09-09 05:04:26: I0909 05:04:26.388706 90003 containerizer.cpp:3187] 
> Transitioning the state of container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 
> from RUNNING to DESTROYING
> 2019-09-09 05:04:26: I0909 05:04:26.388720 90003 containerizer.cpp:1203] 
> Cleaning up orphan container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.388729 90003 containerizer.cpp:2520] 
> Destroying container 2ee154e2-3cc4-420a-99fb-065e740f3091 in RUNNING state
> 2019-09-09 05:04:26: I0909 05:04:26.388737 90003 containerizer.cpp:3187] 
> Transitioning the state of container 2ee154e2-3cc4-420a-99fb-065e740f3091 
> from RUNNING to DESTROYING
> 2019-09-09 05:04:26: I0909 05:04:26.388783 90003 containerizer.cpp:3026] 
> Container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 has 
> exited
> 2019-09-09 05:04:26: I0909 05:04:26.388837 89929 linux_launcher.cpp:576] 
> Asked to destroy container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.388904 89929 linux_launcher.cpp:618] 
> Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97'
> 2019-09-09 05:04:26: I0909 05:04:26.389147 89929 linux_launcher.cpp:576] 
> Asked to destroy container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.389173 89929 linux_launcher.cpp:618] 
> Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436'
> 2019-09-09 05:04:26: I0909 05:04:26.389261 89947 cgroups.cpp:2854] Freezing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.389269 89948 cgroups.cpp:2854] Freezing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.389454 89953 cgroups.cpp:2854] Freezing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.389530 89956 cgroups.cpp:1242] 
> Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
>  after 166912ns
> 2019-09-09 05:04:26: I0909 05:04:26.389582 89965 cgroups.cpp:2854] Freezing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.389605 89937 cgroups.cpp:1242] 
> Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97
>  after 269056ns
> 2019-09-09 05:04:26: I0909 05:04:26.389679 89964 cgroups.cpp:1242] 
> Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436
>  after 145920ns
> 2019-09-09 05:04:26: I0909 05:04:26.389761 89963 cgroups.cpp:2872] Thawing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.389888 89969 cgroups.cpp:1242] 
> Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
>  after 219136ns
> 2019-09-09 05:04:26: I0909 05:04:26.389904 89974 cgroups.cpp:2872] Thawing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.390111 89980 cgroups.cpp:2872] Thawing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.390151 89987 cgroups.cpp:1271] 
> Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436
>  after 128us
> 2019-09-09 05:04:26: I0909 05:04:26.390199 89980 cgroups.cpp:1271] 
> Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
>  after 47104ns
> 2019-09-09 05:04:26: I0909 05:04:26.390290 89956 cgroups.cpp:2872] Thawing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.390463 89983 linux_launcher.cpp:650] 
> Destroying cgroup 
> '/sys/fs/cgroup/systemd/mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436'
> 2019-09-09 05:04:26: I0909 05:04:26.392710 89995 cgroups.cpp:1271] 
> Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97
>  after 2.397184ms
> 2019-09-09 05:04:26: I0909 05:04:26.394942 89976 containerizer.cpp:2812] 
> Checkpointing termination state to nested container's runtime directory 
> '/var/run/mesos/containers/2ee154e2-3cc4-420a-99fb-065e740f3091/containers/49fe2bf9-17af-415f-92b6-92a4db619436/termination'
> 2019-09-09 05:04:26: mesos-agent: 
> /pkg/src/mesos/3rdparty/stout/include/stout/option.hpp:119: T& 
> Option<T>::get() & [with T = std::basic_string<char>]: Assertion `isSome()' 
> failed.
> 2019-09-09 05:04:26: *** Aborted at 1568019866 (unix time) try "date -d 
> @1568019866" if you are using GNU date ***
> 2019-09-09 05:04:26: PC: @     0x7f8229cc02c7 __GI_raise
> 2019-09-09 05:04:26: *** SIGABRT (@0x15f32) received by PID 89906 (TID 
> 0x7f820c148700) from PID 89906; stack trace: ***
> 2019-09-09 05:04:26: @     0x7f822a066680 (unknown)
> 2019-09-09 05:04:26: @     0x7f8229cc02c7 __GI_raise
> 2019-09-09 05:04:26: @     0x7f8229cc19b8 __GI_abort
> 2019-09-09 05:04:26: @     0x7f8229cb90e6 __assert_fail_base
> 2019-09-09 05:04:26: @     0x7f8229cb9192 __GI___assert_fail
> 2019-09-09 05:04:26: @     0x7f822d306e33 _ZNR6OptionISsE3getEv.part.137
> 2019-09-09 05:04:26: @     0x7f822d317c4f 
> mesos::internal::slave::MesosContainerizerProcess::______destroy()
> 2019-09-09 05:04:26: I0909 05:04:26.418018 89974 token_retriever.cpp:422] 
> Successfuly acquired token with expiration set at 2019-09-09 09:09:26+00:00
> 2019-09-09 05:04:26: I0909 05:04:26.418375 89974 token_retriever.cpp:280] 
> Scheduling token refresh tu run at 2019-09-09 09:08:56.041828249+00:00
> 2019-09-09 05:04:26: @     0x7f822de72fc1 process::ProcessBase::consume()
> 2019-09-09 05:04:26: @     0x7f822de899ac process::ProcessManager::resume()
> 2019-09-09 05:04:26: @     0x7f822de8f466 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 2019-09-09 05:04:26: @     0x7f822a840070 (unknown)
> 2019-09-09 05:04:26: @     0x7f822a05edd5 start_thread
> 2019-09-09 05:04:26: @     0x7f8229d88bfd __clone
> 2019-09-09 05:04:26: dcos-mesos-slave.service: main process exited, 
> code=killed, status=6/ABRT
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to