[
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091776#comment-15091776
]
Martin Bydzovsky commented on MESOS-4279:
-----------------------------------------
Thanks [~qianzhang] for your time. Well this is the moment when I started to
feel bad. Exactly this config is not working for me.. :/ I've just started
simple local mesos-marathon cluster with following config:
mesos-slave http://10.141.141.10:5051/state.json
{code:javascript}
{
attributes: {},
build_date: "2015-10-12 20:57:28",
build_time: 1444683448,
build_user: "root",
completed_frameworks: [],
flags: {
appc_store_dir: "/tmp/mesos/store/appc",
authenticatee: "crammd5",
cgroups_cpu_enable_pids_and_tids_count: "false",
cgroups_enable_cfs: "false",
cgroups_hierarchy: "/sys/fs/cgroup",
cgroups_limit_swap: "false",
cgroups_root: "mesos",
container_disk_watch_interval: "15secs",
containerizers: "docker,mesos",
default_role: "*",
disk_watch_interval: "1mins",
docker: "docker",
docker_kill_orphans: "true",
docker_remove_delay: "6hrs",
docker_socket: "/var/run/docker.sock",
docker_stop_timeout: "10secs",
enforce_container_disk_quota: "false",
executor_registration_timeout: "5mins",
executor_shutdown_grace_period: "5secs",
fetcher_cache_dir: "/tmp/mesos/fetch",
fetcher_cache_size: "2GB",
frameworks_home: "",
gc_delay: "1weeks",
gc_disk_headroom: "0.1",
hadoop_home: "",
help: "false",
hostname: "10.141.141.10",
hostname_lookup: "true",
image_provisioner_backend: "copy",
initialize_driver_logging: "true",
isolation: "posix/cpu,posix/mem",
launcher_dir: "/usr/libexec/mesos",
log_dir: "/var/log/mesos",
logbufsecs: "0",
logging_level: "INFO",
master: "zk://localhost:2181/mesos",
oversubscribed_resources_interval: "15secs",
perf_duration: "10secs",
perf_interval: "1mins",
port: "5051",
qos_correction_interval_min: "0ns",
quiet: "false",
recover: "reconnect",
recovery_timeout: "15mins",
registration_backoff_factor: "1secs",
resource_monitoring_interval: "1secs",
revocable_cpu_low_priority: "true",
sandbox_directory: "/mnt/mesos/sandbox",
strict: "true",
switch_user: "true",
systemd_runtime_directory: "/run/systemd/system",
version: "false",
work_dir: "/tmp/mesos"
},
git_sha: "2dd7f7ee115fe00b8e098b0a10762a4fa8f4600f",
git_tag: "0.25.0",
hostname: "10.141.141.10",
id: "35e27fef-76b9-43f5-921d-83574ded0405-S0",
log_dir: "/var/log/mesos",
master_hostname: "mesos.vm",
pid: "slave(1)@127.0.1.1:5051",
resources: {
cpus: 2,
disk: 34068,
mem: 1000,
ports: "[31000-32000]"
},
start_time: 1452510028.844,
version: "0.25.0"
}
{code}
mesos-master http://10.141.141.10:5050/state.json
{code:javascript}
{
activated_slaves: 1,
build_date: "2015-10-12 20:57:28",
build_time: 1444683448,
build_user: "root",
completed_frameworks: [],
deactivated_slaves: 0,
elected_time: 1452509876.02982,
flags: {
allocation_interval: "1secs",
allocator: "HierarchicalDRF",
authenticate: "false",
authenticate_slaves: "false",
authenticators: "crammd5",
authorizers: "local",
framework_sorter: "drf",
help: "false",
hostname_lookup: "true",
initialize_driver_logging: "true",
log_auto_initialize: "true",
log_dir: "/var/log/mesos",
logbufsecs: "0",
logging_level: "INFO",
max_slave_ping_timeouts: "5",
port: "5050",
quiet: "false",
quorum: "1",
recovery_slave_removal_limit: "100%",
registry: "replicated_log",
registry_fetch_timeout: "1mins",
registry_store_timeout: "5secs",
registry_strict: "false",
root_submissions: "true",
slave_ping_timeout: "15secs",
slave_reregister_timeout: "10mins",
user_sorter: "drf",
version: "false",
webui_dir: "/usr/share/mesos/webui",
work_dir: "/var/lib/mesos",
zk: "zk://localhost:2181/mesos",
zk_session_timeout: "10secs"
},
start_time: 1452509866.83634,
unregistered_frameworks: [],
version: "0.25.0"
}
{code}
marathon app-definition:
{code}
data =
args: ["./script.py"]
container:
type: "DOCKER"
docker:
image: "bydga/marathon-test-api"
cpus: 0.1
mem: 256
instances: 1
id: "python-docker"
{code}
and stdout of the task:
{code}
--container="mesos-35e27fef-76b9-43f5-921d-83574ded0405-S0.e918a301-d065-423d-9d0b-41a86f9aa15e"
--docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
--initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
--mapped_directory="/mnt/mesos/sandbox" --quiet="false"
--sandbox_directory="/tmp/mesos/slaves/35e27fef-76b9-43f5-921d-83574ded0405-S0/frameworks/b3149af6-b380-4983-b595-879519d7678f-0001/executors/python-docker.91dfa48d-b852-11e5-814c-024222188950/runs/e918a301-d065-423d-9d0b-41a86f9aa15e"
--stop_timeout="10secs"
--container="mesos-35e27fef-76b9-43f5-921d-83574ded0405-S0.e918a301-d065-423d-9d0b-41a86f9aa15e"
--docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
--initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
--mapped_directory="/mnt/mesos/sandbox" --quiet="false"
--sandbox_directory="/tmp/mesos/slaves/35e27fef-76b9-43f5-921d-83574ded0405-S0/frameworks/b3149af6-b380-4983-b595-879519d7678f-0001/executors/python-docker.91dfa48d-b852-11e5-814c-024222188950/runs/e918a301-d065-423d-9d0b-41a86f9aa15e"
--stop_timeout="10secs"
Registered docker executor on 10.141.141.10
Starting task python-docker.91dfa48d-b852-11e5-814c-024222188950
Hello
11:00:48.315871
Iteration #1
11:00:49.317315
...
Iteration #513
Killing docker task
Shutting down
{code}
docker version:
{code}
root@mesos:/home/vagrant# docker -v
Docker version 1.9.1, build a34a1d5
{code}
can you paste your whole {{state.json}} so we can compare them?
> Graceful restart of docker task
> -------------------------------
>
> Key: MESOS-4279
> URL: https://issues.apache.org/jira/browse/MESOS-4279
> Project: Mesos
> Issue Type: Bug
> Components: containerization, docker
> Affects Versions: 0.25.0
> Reporter: Martin Bydzovsky
> Assignee: Qian Zhang
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I
> came to a following issue:
> (it was already discussed on
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
> print "got %i" % _signo
> print datetime.datetime.now().time()
> sys.stdout.flush()
> sleep(2)
> print datetime.datetime.now().time()
> print "ending"
> sys.stdout.flush()
> sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
> print "Hello"
> i = 0
> while True:
> i += 1
> print datetime.datetime.now().time()
> print "Iteration #%i" % i
> sys.stdout.flush()
> sleep(1)
> finally:
> print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
> args: ["/tmp/script.py"],
> instances: 1,
> cpus: 0.1,
> mem: 256,
> id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
> args: ["./script.py"],
> container: {
> type: "DOCKER",
> docker: {
> image: "bydga/marathon-test-api"
> },
> forcePullImage: yes
> },
> cpus: 0.1,
> mem: 256,
> instances: 1,
> id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without
> having a chance to do any cleanup.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)