[ https://issues.apache.org/jira/browse/MESOS-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590356#comment-16590356 ]
Kirill Plyashkevich commented on MESOS-9180: -------------------------------------------- somewhat related to MESOS-8679, but in this case killing is actually being retried. > tasks get stuck in TASK_KILLING on the default executor > ------------------------------------------------------- > > Key: MESOS-9180 > URL: https://issues.apache.org/jira/browse/MESOS-9180 > Project: Mesos > Issue Type: Bug > Components: executor > Affects Versions: 1.6.1 > Environment: Ubuntu 18.04, Ubuntu 16.04 > Reporter: Kirill Plyashkevich > Priority: Critical > > during our load tests tasks get stuck in TASK_KILLING state > {quote}{noformat} > I0823 16:30:20.367563 21608 executor.cpp:192] Version: 1.6.1 > I0823 16:30:20.439478 21684 default_executor.cpp:202] Received SUBSCRIBED > event > I0823 16:30:20.441012 21684 default_executor.cpp:206] Subscribed executor on > XX.XXX.XX.XXX > I0823 16:30:20.916216 21665 default_executor.cpp:202] Received LAUNCH_GROUP > event > I0823 16:30:20.917373 21645 default_executor.cpp:426] Setting > 'MESOS_CONTAINER_IP' to: 172.26.10.222 > I0823 16:30:22.573794 21658 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:22.575518 21637 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:22.577137 21665 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.091509 21642 default_executor.cpp:661] Finished launching > tasks [ > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery > ] in child containers [ > 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8, > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7, > 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b ] > I0823 16:30:33.091567 21642 default_executor.cpp:685] Waiting on child > containers of tasks [ > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery > ] > I0823 16:30:33.096014 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > I0823 16:30:33.096310 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > I0823 16:30:33.096470 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > I0823 16:30:33.521510 21648 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.522073 21652 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.523569 21679 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:38.593736 21668 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > (stdout): > 0 > PONG > I0823 16:30:38.593777 21668 checker_process.cpp:817] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > (stderr): > I0823 16:30:38.610167 21650 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > (stdout): > I0823 16:30:38.610194 21650 checker_process.cpp:817] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > (stderr): > I0823 16:30:38.700561 21681 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > (stdout): > I0823 16:30:38.700598 21681 checker_process.cpp:817] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > (stderr): > I0823 16:30:42.786908 21649 checker_process.cpp:971] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > returned: 0 > I0823 16:30:42.787267 21649 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis', > task is healthy > I0823 16:30:45.156363 21658 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:48.454120 21653 checker_process.cpp:971] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > returned: 1 > W0823 16:30:48.454218 21653 health_checker.cpp:283] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed: Command terminated with signal Hangup > W0823 16:30:48.454242 21653 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed 1 times consecutively > I0823 16:30:48.454370 21653 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka', > task is not healthy > I0823 16:30:50.887114 21666 checker_process.cpp:971] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > returned: 1 > W0823 16:30:50.887183 21666 health_checker.cpp:283] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed: Command terminated with signal Hangup > W0823 16:30:50.887198 21666 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed 1 times consecutively > I0823 16:30:50.887295 21657 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery', > task is not healthy > I0823 16:30:51.289993 21689 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:51.607558 21659 default_executor.cpp:202] Received ACKNOWLEDGED > event > W0823 16:31:23.851263 21657 health_checker.cpp:273] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed: Command timed out after 5secs > W0823 16:31:23.851332 21657 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed 2 times consecutively > I0823 16:31:23.851519 21641 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery', > task is not healthy > W0823 16:31:24.081169 21654 health_checker.cpp:273] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed: Command timed out after 5secs > W0823 16:31:24.081220 21654 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed 2 times consecutively > I0823 16:31:24.081336 21654 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka', > task is not healthy > I0823 16:31:24.487970 21659 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:31:26.176144 21682 default_executor.cpp:202] Received ACKNOWLEDGED > event > W0823 16:31:48.187378 21659 health_checker.cpp:273] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed: Command timed out after 5secs > W0823 16:31:48.187428 21659 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed 3 times consecutively > I0823 16:31:48.187537 21659 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery', > task is not healthy > W0823 16:31:48.210490 21676 health_checker.cpp:273] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed: Command timed out after 5secs > W0823 16:31:48.210537 21676 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed 3 times consecutively > I0823 16:31:48.210651 21676 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka', > task is not healthy > I0823 16:31:48.426265 21660 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:31:48.427875 21640 default_executor.cpp:202] Received ACKNOWLEDGED > event > W0823 16:32:24.028173 21638 health_checker.cpp:273] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed: Command timed out after 5secs > W0823 16:32:24.028211 21638 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed 4 times consecutively > I0823 16:32:24.028343 21638 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery', > task is not healthy > W0823 16:32:24.080215 21688 health_checker.cpp:273] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed: Command timed out after 5secs > W0823 16:32:24.080267 21688 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed 4 times consecutively > I0823 16:32:24.080369 21688 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka', > task is not healthy > I0823 16:32:24.994634 21672 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:32:25.002722 21683 default_executor.cpp:202] Received ACKNOWLEDGED > event > W0823 16:32:49.181438 21671 health_checker.cpp:273] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed: Command timed out after 5secs > W0823 16:32:49.181476 21671 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > failed 5 times consecutively > I0823 16:32:49.181608 21671 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery', > task is not healthy > I0823 16:32:49.182938 21671 default_executor.cpp:1249] Received kill for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > I0823 16:32:49.183014 21671 checker_process.cpp:281] Stopped COMMAND health > check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > I0823 16:32:49.183149 21671 default_executor.cpp:1124] Killing task > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery > running in child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b > with SIGTERM signal > I0823 16:32:49.183159 21671 default_executor.cpp:1135] Scheduling escalation > to SIGKILL in 90secs from now > I0823 16:32:50.426288 21665 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:32:50.430682 21682 default_executor.cpp:202] Received ACKNOWLEDGED > event > W0823 16:32:50.917691 21689 health_checker.cpp:273] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed: Command timed out after 5secs > W0823 16:32:50.917750 21689 health_checker.cpp:305] COMMAND health check for > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > failed 5 times consecutively > I0823 16:32:50.917850 21680 default_executor.cpp:1375] Received task health > update for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka', > task is not healthy > I0823 16:32:50.919066 21680 default_executor.cpp:1249] Received kill for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > I0823 16:32:50.919119 21680 checker_process.cpp:281] Stopped COMMAND health > check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > I0823 16:32:50.919231 21680 default_executor.cpp:1124] Killing task > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka > running in child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 > with SIGTERM signal > I0823 16:32:50.919241 21680 default_executor.cpp:1135] Scheduling escalation > to SIGKILL in 90secs from now > I0823 16:32:51.127272 21651 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:32:51.130367 21670 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:32:51.973668 21665 default_executor.cpp:953] Child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > completed in state TASK_KILLED: Command terminated with signal Terminated > I0823 16:32:51.973721 21665 default_executor.cpp:974] Killing task group > containing tasks [ > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery > ] > I0823 16:32:51.973819 21691 checker_process.cpp:281] Stopped COMMAND health > check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > I0823 16:32:51.973997 21665 default_executor.cpp:1124] Killing task > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis > running in child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 > with SIGTERM signal > I0823 16:32:51.974021 21665 default_executor.cpp:1135] Scheduling escalation > to SIGKILL in 3secs from now > I0823 16:32:51.975106 21665 default_executor.cpp:953] Child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > completed in state TASK_KILLED: Command terminated with signal Terminated > I0823 16:32:51.995775 21671 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:32:51.997719 21644 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:32:52.003360 21676 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:32:54.974514 21646 default_executor.cpp:1213] Task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > running in child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 did > not terminate after 3secs, sending SIGKILL to the container > W0823 16:32:54.982900 21650 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > running in child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 > failed: The agent failed to send signal Killed (9) to the container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 16:32:55.983327 21639 default_executor.cpp:1213] Task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > running in child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 did > not terminate after 3secs, sending SIGKILL to the container > W0823 16:32:55.990069 21644 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > running in child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 > failed: The agent failed to send signal Killed (9) to the container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 16:32:56.991422 21670 default_executor.cpp:1213] Task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > running in child container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 did > not terminate after 3secs, sending SIGKILL to the container > {noformat}{quote} > and then it loops forever with retrying to kill already non-existing process > the other form of that bug we observed is > {quote}{noformat} > I0823 11:19:44.460397 35632 default_executor.cpp:202] Received KILL event > I0823 11:19:44.460433 35632 default_executor.cpp:1249] Received kill for task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > W0823 11:19:44.460445 35632 default_executor.cpp:1259] Ignoring kill for task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > as it is in the process of getting killed > I0823 11:19:45.078868 35660 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:45.083555 35645 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:46.084547 35637 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:46.088583 35639 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:47.089757 35630 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:47.094741 35631 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:48.095813 35623 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:48.100821 35632 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:49.101478 35627 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:49.105983 35651 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:50.106503 35662 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:50.111423 35723 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:51.112059 35725 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:51.116915 35664 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:52.118046 35653 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:52.122288 35616 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:53.123337 35658 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:53.128535 35641 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:54.129462 35633 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:54.133767 35644 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:55.134635 35618 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:55.138553 35622 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:56.139037 35638 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:56.142948 35724 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:57.143637 35659 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:57.148473 35636 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:58.149035 35648 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:58.152792 35621 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:19:59.153236 35629 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:19:59.157325 35656 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:20:00.158377 35660 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:20:00.162392 35627 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:20:01.162860 35637 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:20:01.167155 35662 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:20:02.167553 35630 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:20:02.172479 35725 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:20:03.173439 35619 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:20:03.177597 35653 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:20:04.178180 35727 default_executor.cpp:1213] Task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did > not terminate after 1mins, sending SIGKILL to the container > W0823 11:20:04.182360 35658 default_executor.cpp:1222] Escalation to SIGKILL > the task > 'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis' > running in child container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 > failed: The agent failed to send signal Killed (9) to the container > ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: > Unable to send signal to container: No such process; Retrying in 1secs > I0823 11:20:04.460697 35662 default_executor.cpp:202] Received KILL event > {noformat}{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)