[ 
https://issues.apache.org/jira/browse/AURORA-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620934#comment-14620934
 ] 

Brian Brazil commented on AURORA-1388:
--------------------------------------

>From the mesos_slave:
{noformat}
I0709 17:46:56.973000 32146 slave.cpp:551] Received SIGUSR1 signal from user 
root; unregistering and shutting down
I0709 17:46:56.974498 32146 slave.cpp:1745] Asked to shut down framework 
20141204-181105-3258863626-3010-2174-0000 by @0.0.0.0:0
I0709 17:46:56.974972 32146 slave.cpp:1770] Shutting down framework 
20141204-181105-3258863626-3010-2174-0000
I0709 17:46:56.976040 32146 slave.cpp:3443] Shutting down executor 
'thermos-1436463818533-platform-prod-boxever_oauth_service-0-348277f3-0b0b-4eb9-aa0e-4b7399f894af'
 of framework 20141204-181105-3258863626-3010-2174-0000
I0709 17:47:01.979759 32140 slave.cpp:3513] Killing executor 
'thermos-1436463818533-platform-prod-boxever_oauth_service-0-348277f3-0b0b-4eb9-aa0e-4b7399f894af'
 of framework 20141204-181105-3258863626-3010-2174-0000
I0709 17:47:01.980233 32142 containerizer.cpp:906] Destroying container 
'4d247816-d4ab-4256-a724-b5c6df7c1877'
I0709 17:47:02.034024 32143 containerizer.cpp:1111] Executor for container 
'4d247816-d4ab-4256-a724-b5c6df7c1877' has exited
I0709 17:47:02.035811 32145 slave.cpp:3193] Executor 
'thermos-1436463818533-platform-prod-boxever_oauth_service-0-348277f3-0b0b-4eb9-aa0e-4b7399f894af'
 of framework 20141204-181105-3258863626-3010-2174-0000 terminated with signal 
Killed
I0709 17:47:02.037399 32145 slave.cpp:3302] Cleaning up executor 
'thermos-1436463818533-platform-prod-boxever_oauth_service-0-348277f3-0b0b-4eb9-aa0e-4b7399f894af'
 of framework 20141204-181105-3258863626-3010-2174-0000
I0709 17:47:02.039018 32143 gc.cpp:56] Scheduling 
'/srv/mesos_slave/work_dir/slaves/20150709-173754-3258863626-3010-31252-S0/frameworks/20141204-181105-3258863626-3010-2174-0000/executors/thermos-1436463818533-platform-prod-boxever_oauth_service-0-348277f3-0b0b-4eb9-aa0e-4b7399f894af/runs/4d247816-d4ab-4256-a724-b5c6df7c1877'
 for gc 6.99999955261333days in the future
I0709 17:47:02.039146 32145 slave.cpp:3381] Cleaning up framework 
20141204-181105-3258863626-3010-2174-0000
I0709 17:47:02.051360 32140 status_update_manager.cpp:279] Closing status 
update streams for framework 20141204-181105-3258863626-3010-2174-0000
I0709 17:47:02.051559 32145 slave.cpp:506] Slave terminating
I0709 17:47:02.039408 32143 gc.cpp:56] Scheduling 
'/srv/mesos_slave/work_dir/slaves/20150709-173754-3258863626-3010-31252-S0/frameworks/20141204-181105-3258863626-3010-2174-0000/executors/thermos-1436463818533-platform-prod-boxever_oauth_service-0-348277f3-0b0b-4eb9-aa0e-4b7399f894af'
 for gc 6.99999955068444days in the future
I0709 17:47:02.052327 32143 gc.cpp:56] Scheduling 
'/srv/mesos_slave/work_dir/meta/slaves/20150709-173754-3258863626-3010-31252-S0/frameworks/20141204-181105-3258863626-3010-2174-0000/executors/thermos-1436463818533-platform-prod-boxever_oauth_service-0-348277f3-0b0b-4eb9-aa0e-4b7399f894af/runs/4d247816-d4ab-4256-a724-b5c6df7c1877'
 for gc 6.99999954937481days in the future

{noformat}

>From the thermos_runner.INFO:
{noformat}
I0709 17:44:43.287700 1597 runner.py:789] Forking Process(app)

<no further logs, and nothing useful in DEBUG either>
{noformat}

>From ps output:
{noformat}
$ ps -fu platform
UID        PID  PPID  C STIME TTY          TIME CMD
platform  1607     1  0 17:44 ?        00:00:00 <thermos_runner>
platform  1609  1607  0 17:44 ?        00:00:00 /bin/bash -c ?  while sleep 
300; do?
platform  1611  1609  0 17:44 ?        00:00:00 sleep 300
platform  1640     1  0 17:44 ?        00:00:00 <thermos_runner>
platform  1643  1640  0 17:44 ?        00:00:00 <main job bash container>
platform  1645  1643  6 17:44 ?        00:00:09 <main job>

{noformat}

> If mesos_slave gets a SIGUSR1, thermos doesn't shutdown cleanly
> ---------------------------------------------------------------
>
>                 Key: AURORA-1388
>                 URL: https://issues.apache.org/jira/browse/AURORA-1388
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Brian Brazil
>
> https://issues.apache.org/jira/browse/MESOS-1475 allows for a SIGUSR1 to be 
> sent to a mesos slave in order to shut it down and any processes cleanly, 
> useful for changing slave attributes.
> I tried this with my aurora setup, and via tcpdump found that it sent the 
> first {{/shutdown}} http request to the task - but nothing after it. The 
> process also kept on running, holding onto a static port in my case that 
> prevented things from working when a task is scheduled on that slave when it 
> comes back up.
> We should ensure that thermos behaves correctly when the mesos slave gets a 
> SIGUSR1, following the lifecycle policy and ultimately killing the processes 
> if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to