[ 
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090611#comment-15090611
 ] 

Qian Zhang commented on MESOS-4279:
-----------------------------------

Thanks [~bydga], and you are right, I should use {{args}} for my Marathon app 
rather than {{cmd}}, and after I changed to {{args}}, I found the script can be 
terminated gracefully.

Here is my Dockerfile, I used this Dockerfile to build a Docker image 
"mesos-4279".
{code}
FROM python:2.7

RUN mkdir /app
ADD script.py /app
WORKDIR /app
ENTRYPOINT []
{code}

My Marathon app:
{code}
{
  "id": "app-docker1", 
  "args": ["./script.py"],
  "cpus": 0.1,
  "mem": 16.0,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "mesos-4279",
      "network": "BRIDGE"
    }
  }
}
{code}

And when I created the app in Marathon, there will be a Docker container 
created accordingly:
{code}
# docker ps 
CONTAINER ID        IMAGE               COMMAND             CREATED             
STATUS              PORTS               NAMES
3c37368be3fd        mesos-4279          "./script.py"       6 minutes ago       
Up 6 minutes                            
mesos-0e66b344-aee2-45be-b5ec-d606f3a14dfb-S0.9aa0680f-7f30-4f05-adf7-5759ec4ce066
{code}

And then when I restarted the app in Marathon, I found:
{code}
# docker logs -f 3c37368be3fd
Hello
12:35:21.677816
Iteration #1
12:35:22.678474
...
12:38:09.905829
Iteration #169
got 15
12:38:10.386334
12:38:12.388973
ending
Goodbye
{code}

So as you see, the SIGTERM was caught and handled gracefully by the script, I 
think this is the expected behavior, right? So I am not sure what happened in 
your environment, when you started Mesos slave, did you specify 
{{--docker_stop_timeout}}? If you did not specify it, its default value is 0 
seconds, that means "docker stop" will send SIGKILL right after SIGTERM, then 
the script has no chance to handle SIGTERM gracefully.

> Graceful restart of docker task
> -------------------------------
>
>                 Key: MESOS-4279
>                 URL: https://issues.apache.org/jira/browse/MESOS-4279
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 0.25.0
>            Reporter: Martin Bydzovsky
>            Assignee: Qian Zhang
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I 
> came to a following issue:
> (it was already discussed on 
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere 
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
>     print "got %i" % _signo
>     print datetime.datetime.now().time()
>     sys.stdout.flush()
>     sleep(2)
>     print datetime.datetime.now().time()
>     print "ending"
>     sys.stdout.flush()
>     sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
>     print "Hello"
>     i = 0
>     while True:
>         i += 1
>         print datetime.datetime.now().time()
>         print "Iteration #%i" % i
>         sys.stdout.flush()
>         sleep(1)
> finally:
>     print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
>       args: ["/tmp/script.py"],
>       instances: 1,
>       cpus: 0.1,
>       mem: 256,
>       id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and 
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
>       args: ["./script.py"],
>       container: {
>               type: "DOCKER",
>               docker: {
>                       image: "bydga/marathon-test-api"
>               },
>               forcePullImage: yes
>       },
>       cpus: 0.1,
>       mem: 256,
>       instances: 1,
>       id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without 
> having a chance to do any cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to