[ https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095831#comment-15095831 ]
Martin Bydzovsky commented on MESOS-4279: ----------------------------------------- Hi again, So we finally managed to compile and deploy the Mesos 0.27 - and I still can't get this working. :/ It's always the same output - restarting app in marathon results in {code} ... Iteration #29 Killing docker task Shutting down {code} We have dockerized both - mesos-master and slave - so you can easily reproduce our setup like: {code:title=mesos-slave|borderStyle=solid} docker run -it --privileged -p 5051:5051 -v /var/run/docker.sock:/var/run/docker.sock falsecz/mesos:git-468b8ec-with-docker mesos-slave --master=10.141.141.10:5050 --containerizers=mesos,docker --docker_stop_timeout=10secs --isolation=cgroups/cpu,cgroups/mem --advertise_ip=10.141.141.10 --no-switch_user --hostname=10.141.141.10 {code} {code:title=mesos-master|borderStyle=solid} docker run -it --privileged -p 5050:5050 falsecz/mesos:git-468b8ec mesos-master --work_dir=/tmp --advertise_ip=10.141.141.10 —hostname=10.141.141.10 {code} {code:title=marathon|borderStyle=solid} ./start --master 10.141.141.10:5050 --zk zk://localhost:2181/marathon {code} No rocket science - the simplest setup possible, but I really don't know how your setup could differ from this one. > Graceful restart of docker task > ------------------------------- > > Key: MESOS-4279 > URL: https://issues.apache.org/jira/browse/MESOS-4279 > Project: Mesos > Issue Type: Bug > Components: containerization, docker > Affects Versions: 0.25.0 > Reporter: Martin Bydzovsky > Assignee: Qian Zhang > > I'm implementing a graceful restarts of our mesos-marathon-docker setup and I > came to a following issue: > (it was already discussed on > https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere > got to a point that its probably a docker containerizer problem...) > To sum it up: > When i deploy simple python script to all mesos-slaves: > {code} > #!/usr/bin/python > from time import sleep > import signal > import sys > import datetime > def sigterm_handler(_signo, _stack_frame): > print "got %i" % _signo > print datetime.datetime.now().time() > sys.stdout.flush() > sleep(2) > print datetime.datetime.now().time() > print "ending" > sys.stdout.flush() > sys.exit(0) > signal.signal(signal.SIGTERM, sigterm_handler) > signal.signal(signal.SIGINT, sigterm_handler) > try: > print "Hello" > i = 0 > while True: > i += 1 > print datetime.datetime.now().time() > print "Iteration #%i" % i > sys.stdout.flush() > sleep(1) > finally: > print "Goodbye" > {code} > and I run it through Marathon like > {code:javascript} > data = { > args: ["/tmp/script.py"], > instances: 1, > cpus: 0.1, > mem: 256, > id: "marathon-test-api" > } > {code} > During the app restart I get expected result - the task receives sigterm and > dies peacefully (during my script-specified 2 seconds period) > But when i wrap this python script in a docker: > {code} > FROM node:4.2 > RUN mkdir /app > ADD . /app > WORKDIR /app > ENTRYPOINT [] > {code} > and run appropriate application by Marathon: > {code:javascript} > data = { > args: ["./script.py"], > container: { > type: "DOCKER", > docker: { > image: "bydga/marathon-test-api" > }, > forcePullImage: yes > }, > cpus: 0.1, > mem: 256, > instances: 1, > id: "marathon-test-api" > } > {code} > The task during restart (issued from marathon) dies immediately without > having a chance to do any cleanup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)