[jira] [Commented] (MESOS-4827) Destroy Docker container from Marathon kills Mesos slave

Geoffroy Jabouley (JIRA) Tue, 08 Mar 2016 05:19:04 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184898#comment-15184898
 ]


Geoffroy Jabouley commented on MESOS-4827:
------------------------------------------

Hi

we are also randomly facing this issue (mesos 0.25.0), and the most anoying 
thing is that slave recovery does not work, so all tasks are restarted. This 
can lead to unwanted result in case of persistent data being used by 2 mesos 
tasks at the same time.

Here is the scenario of the latest failure:

+INIT:+ There are 2 mesos tasks A and B running on the slave S0, each in a 
Docker container.

+ACTION:+ we restart task A through Marathon

+EXPECTED:+ task A is restarted on the mesos cluster

+RESULTS:+

1- Task A is being killed by the slave S0, but slave S0 crashed because of 
*CHECK_SOME(os::touch(path)) error*
{noformat}
I0308 11:39:00.489042 32151 slave.cpp:1789] Asked to kill task 
taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000
I0308 11:39:10.866835 32145 slave.cpp:2717] Handling status update TASK_KILLED 
(UUID: a71e4a06-5f1a-4adc-8514-53a86053bc1a) for task 
taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000 from executor(1)@10.195.30.138:47413
E0308 11:39:10.867081 32146 slave.cpp:2911] Failed to update resources for 
container 8ef81db0-862c-4ddf-8510-92dac9a6c58f of executor 
taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf running task 
taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf on status update for terminal task, 
destroying container: Failed to determine cgroup for the 'cpu' subsystem: 
Failed to read /proc/3972/cgroup: Failed to open file '/proc/3972/cgroup': No 
such file or directory
I0308 11:39:10.867172 32145 docker.cpp:1390] Destroying container 
'8ef81db0-862c-4ddf-8510-92dac9a6c58f'
I0308 11:39:10.867178 32146 status_update_manager.cpp:322] Received status 
update TASK_KILLED (UUID: a71e4a06-5f1a-4adc-8514-53a86053bc1a) for task 
taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000
I0308 11:39:10.867188 32145 docker.cpp:1452] Sending SIGTERM to executor with 
pid: 3927
I0308 11:39:10.867194 32146 status_update_manager.cpp:826] Checkpointing UPDATE 
for status update TASK_KILLED (UUID: a71e4a06-5f1a-4adc-8514-53a86053bc1a) for 
task taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000
I0308 11:39:10.867949 32150 slave.cpp:3016] Forwarding the update TASK_KILLED 
(UUID: a71e4a06-5f1a-4adc-8514-53a86053bc1a) for task 
taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000 to [email protected]:5050
I0308 11:39:10.868008 32150 slave.cpp:2946] Sending acknowledgement for status 
update TASK_KILLED (UUID: a71e4a06-5f1a-4adc-8514-53a86053bc1a) for task 
taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000 to executor(1)@10.195.30.138:47413
I0308 11:39:10.877274 32145 docker.cpp:1494] Running docker stop on container 
'8ef81db0-862c-4ddf-8510-92dac9a6c58f'
I0308 11:39:10.966596 32144 docker.cpp:1592] Executor for container 
'8ef81db0-862c-4ddf-8510-92dac9a6c58f' has exited
I0308 11:39:10.966778 32146 slave.cpp:3440] Executor 
'taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf' of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000 terminated with signal Terminated
I0308 11:39:10.978411 32146 status_update_manager.cpp:394] Received status 
update acknowledgement (UUID: a71e4a06-5f1a-4adc-8514-53a86053bc1a) for task 
taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000
I0308 11:39:10.978487 32146 status_update_manager.cpp:826] Checkpointing ACK 
for status update TASK_KILLED (UUID: a71e4a06-5f1a-4adc-8514-53a86053bc1a) for 
task taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000
I0308 11:39:10.979449 32146 slave.cpp:3544] Cleaning up executor 
'taskA.1832cbe1-da4c-11e5-88d2-7e19a177dadf' of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000
F0308 11:39:10.984439 32146 slave.cpp:3570] CHECK_SOME(os::touch(path)): Failed 
to open file: No such file or directory 
*** Check failure stack trace: ***
    @     0x7fdc3c3394dd  google::LogMessage::Fail()
    @     0x7fdc3c33b21c  google::LogMessage::SendToLog()
    @     0x7fdc3c3390cc  google::LogMessage::Flush()
    @     0x7fdc3c33bb19  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fdc3bdbef2e  mesos::internal::slave::Slave::removeExecutor()
    @     0x7fdc3bdc068e  
mesos::internal::slave::Slave::_statusUpdateAcknowledgement()
    @     0x7fdc3c2eb541  process::ProcessManager::resume()
    @     0x7fdc3c2eb83f  process::internal::schedule()
    @     0x7fdc3ae6e220  (unknown)
    @     0x7fdc3b0c8dc5  start_thread
    @     0x7fdc3a8d821d  __clone
{noformat}
\\
\\
2- Mesos master detects the slave is down
{noformat}
I0308 11:39:11.128128 31416 master.cpp:1080] Slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 at slave(1)@10.195.30.138:5051 
(10.195.30.138) disconnected
I0308 11:39:11.128183 31416 master.cpp:2534] Disconnecting slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 at slave(1)@10.195.30.138:5051 
(10.195.30.138)
I0308 11:39:11.128224 31416 master.cpp:2553] Deactivating slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 at slave(1)@10.195.30.138:5051 
(10.195.30.138)
I0308 11:39:11.128257 31419 hierarchical.hpp:768] Slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 deactivated
{noformat}
\\
\\
3- Mesos slave is restarted (by systemd), and is trying to reregister in the 
mesos cluster. It does not detect any checkpointed resources in 
/tmp/mesos/meta/resources/resources.info, and fails to find the latest slave 
from '/tmp/mesos/meta'.
So it currently registers as a new mesos slave S5, with no associated tasks.
{noformat}
I0308 11:39:31.390815 24763 logging.cpp:172] INFO level logging started!
I0308 11:39:31.390966 24763 main.cpp:185] Build: 2015-10-12 20:59:01 by root
I0308 11:39:31.390974 24763 main.cpp:187] Version: 0.25.0
I0308 11:39:31.390976 24763 main.cpp:190] Git tag: 0.25.0
I0308 11:39:31.390980 24763 main.cpp:194] Git SHA: 
2dd7f7ee115fe00b8e098b0a10762a4fa8f4600f
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@log_env@716: Client 
environment:host.name=ffaas-master-2
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@log_env@723: Client 
environment:os.name=Linux
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@log_env@724: Client 
environment:os.arch=3.10.0-327.el7.x86_64
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@log_env@725: Client 
environment:os.version=#1 SMP Thu Nov 19 22:10:57 UTC 2015
I0308 11:39:31.493149 24763 main.cpp:272] Starting Mesos slave
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@log_env@733: Client 
environment:user.name=(null)
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@log_env@741: Client 
environment:user.home=/root
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@log_env@753: Client 
environment:user.dir=/
2016-03-08 11:39:31,493:24763(0x7f97dacc5700):ZOO_INFO@zookeeper_init@786: 
Initiating client connection, 
host=10.195.30.137:2181,10.195.30.138:2181,10.195.30.139:2181 
sessionTimeout=10000 watcher=0x7f97e4963ca0 sessionId=0 sessionPasswd=<null> 
context=0x7f97c4000fd0 flags=0
I0308 11:39:31.493613 24763 slave.cpp:190] Slave started on 
1)@10.195.30.138:5051
I0308 11:39:31.493621 24763 slave.cpp:191] Flags at startup: 
--appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="docker" --default_role="*" --disk_watch_interval="1mins" 
--docker="docker" --docker_kill_orphans="true" --docker_remove_delay="3days" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="10secs" 
--enforce_container_disk_quota="false" --executor_registration_timeout="5mins" 
--executor_shutdown_grace_period="5secs" 
--external_log_file="/var/log/mesos/slave.log" 
--fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" 
--frameworks_home="" --gc_delay="3days" --gc_disk_headroom="0.1" 
--hadoop_home="" --help="false" --hostname="10.195.30.138" 
--hostname_lookup="true" --image_provisioner_backend="copy" 
--initialize_driver_logging="true" --ip="10.195.30.138" 
--isolation="cgroups/cpu,cgroups/mem" --launcher_dir="/usr/libexec/mesos" 
--log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" 
--master="zk://10.195.30.137:2181,10.195.30.138:2181,10.195.30.139:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --recovery_timeout="30mins" 
--registration_backoff_factor="1secs" --resource_monitoring_interval="10secs" 
--resources="ports:[80-80,443-443,31000-33000]" 
--revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" 
--strict="true" --switch_user="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/tmp/mesos"
I0308 11:39:31.493861 24763 slave.cpp:354] Slave resources: ports(*):[80-80, 
443-443, 31000-33000]; cpus(*):8; mem(*):30791; disk(*):46051
I0308 11:39:31.493978 24763 slave.cpp:390] Slave hostname: 10.195.30.138
I0308 11:39:31.493983 24763 slave.cpp:395] Slave checkpoint: true
2016-03-08 11:39:31,494:24763(0x7f97d8aaf700):ZOO_INFO@check_events@1703: 
initiated connection to server [10.195.30.139:2181]
2016-03-08 11:39:31,495:24763(0x7f97d8aaf700):ZOO_INFO@check_events@1750: 
session establishment complete on server [10.195.30.139:2181], 
sessionId=0x3532c83e3180071, negotiated timeout=10000
I0308 11:39:31.495776 24765 state.cpp:54] Recovering state from 
'/tmp/mesos/meta'
I0308 11:39:31.495810 24769 group.cpp:331] Group process 
(group(1)@10.195.30.138:5051) connected to ZooKeeper
I0308 11:39:31.495815 24765 state.cpp:690] No checkpointed resources found at 
'/tmp/mesos/meta/resources/resources.info'
I0308 11:39:31.495834 24769 group.cpp:805] Syncing group operations: queue size 
(joins, cancels, datas) = (0, 0, 0)
I0308 11:39:31.495841 24769 group.cpp:403] Trying to create path '/mesos' in 
ZooKeeper
I0308 11:39:31.495846 24765 state.cpp:97] Failed to find the latest slave from 
'/tmp/mesos/meta'
I0308 11:39:31.496683 24768 detector.cpp:156] Detected a new leader: (id='1')
I0308 11:39:31.496736 24765 group.cpp:674] Trying to get 
'/mesos/json.info_0000000001' in ZooKeeper
I0308 11:39:31.497294 24769 detector.cpp:481] A new leading master 
([email protected]:5050) is detected
I0308 11:39:31.503237 24765 status_update_manager.cpp:202] Recovering status 
update manager
I0308 11:39:31.503558 24765 docker.cpp:535] Recovering Docker containers
I0308 11:39:31.506376 24770 slave.cpp:4110] Finished recovery
I0308 11:39:31.506497 24770 slave.cpp:4143] Garbage collecting old slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0
I0308 11:39:31.506544 24765 gc.cpp:56] Scheduling 
'/tmp/mesos/slaves/87b609e3-1591-48da-b7d0-19e77c63e8a7-S0' for gc 
2.99999413761481days in the future
I0308 11:39:31.506595 24767 status_update_manager.cpp:176] Pausing sending 
status updates
I0308 11:39:31.506603 24771 slave.cpp:705] New master detected at 
[email protected]:5050
I0308 11:39:31.506701 24765 gc.cpp:56] Scheduling 
'/tmp/mesos/meta/slaves/87b609e3-1591-48da-b7d0-19e77c63e8a7-S0' for gc 
2.99999413740741days in the future
I0308 11:39:31.506810 24771 slave.cpp:730] No credentials provided. Attempting 
to register without authentication
I0308 11:39:31.506865 24771 slave.cpp:741] Detecting new master
I0308 11:39:32.356058 24767 slave.cpp:880] Registered with master 
[email protected]:5050; given slave ID 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S5
I0308 11:39:32.356493 24766 status_update_manager.cpp:183] Resuming sending 
status updates
I0308 11:39:32.356606 24767 slave.cpp:939] Forwarding total oversubscribed 
resources 
{noformat}
\\
\\
4- Mesos master removes slave S0 because a new slave registered at the same 
address. It then sends TASK_LOST to marathon for task B. *So marathon will 
launch a new task B on the cluster*.
{noformat}
I0308 11:39:32.244928 31418 master.cpp:3823] Removing old disconnected slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 at slave(1)@10.195.30.138:5051 
(10.195.30.138) because a registration attempt occurred
I0308 11:39:32.244997 31418 master.cpp:5858] Removing slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 at slave(1)@10.195.30.138:5051 
(10.195.30.138): a new slave registered at the same address
I0308 11:39:32.245033 31418 master.cpp:6081] Updating the latest state of task 
taskB.e0a381db-dc04-11e5-8744-1e7efa872dee of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000 to TASK_LOST
I0308 11:39:32.245070 31423 hierarchical.hpp:706] Removed slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0
I0308 11:39:32.245080 31418 master.cpp:6149] Removing task 
taskB.e0a381db-dc04-11e5-8744-1e7efa872dee with resources cpus(*):0.2; 
mem(*):4096; ports(*):[31353-31355] of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000 on slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 at slave(1)@10.195.30.138:5051 
(10.195.30.138)
I0308 11:39:32.245825 31418 master.cpp:3862] Registering slave at 
slave(1)@10.195.30.138:5051 (10.195.30.138) with id 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S5
I0308 11:39:32.245837 31419 registrar.cpp:441] Applied 1 operations in 23222ns; 
attempting to update the 'registry'
I0308 11:39:32.246433 31416 log.cpp:685] Attempting to append 872 bytes to the 
log
I0308 11:39:32.246467 31421 coordinator.cpp:341] Coordinator attempting to 
write APPEND action at position 13
I0308 11:39:32.246592 31418 replica.cpp:511] Replica received write request for 
position 13
I0308 11:39:32.246762 31418 leveldb.cpp:343] Persisting action (891 bytes) to 
leveldb took 160036ns
I0308 11:39:32.246772 31418 replica.cpp:679] Persisted action at 13
I0308 11:39:32.322706 31418 replica.cpp:658] Replica received learned notice 
for position 13
I0308 11:39:32.323009 31418 leveldb.cpp:343] Persisting action (893 bytes) to 
leveldb took 170518ns
I0308 11:39:32.323024 31418 replica.cpp:679] Persisted action at 13
I0308 11:39:32.323030 31418 replica.cpp:664] Replica learned APPEND action at 
position 13
I0308 11:39:32.323328 31418 registrar.cpp:486] Successfully updated the 
'registry' in 77472us
I0308 11:39:32.323375 31421 log.cpp:704] Attempting to truncate the log to 13
I0308 11:39:32.323429 31417 coordinator.cpp:341] Coordinator attempting to 
write TRUNCATE action at position 14
I0308 11:39:32.323443 31421 master.cpp:5977] Removed slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 (10.195.30.138): a new slave registered 
at the same address
I0308 11:39:32.323458 31421 master.cpp:4449] Sending status update TASK_LOST 
(UUID: b07c0530-b87f-0000-c00b-3318b87f0000) for task 
taskB.e0a381db-dc04-11e5-8744-1e7efa872dee of framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000 'Slave 10.195.30.138 removed: a new 
slave registered at the same address'
I0308 11:39:32.323473 31418 registrar.cpp:441] Applied 1 operations in 15996ns; 
attempting to update the 'registry'
I0308 11:39:32.323631 31419 replica.cpp:511] Replica received write request for 
position 14
I0308 11:39:32.323765 31421 master.cpp:6000] Notifying framework 
87b609e3-1591-48da-b7d0-19e77c63e8a7-0000 (marathon) at 
[email protected]:38050 of lost 
slave 87b609e3-1591-48da-b7d0-19e77c63e8a7-S0 (10.195.30.138) after recovering
I0308 11:39:32.323808 31419 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 151481ns
I0308 11:39:32.323830 31419 replica.cpp:679] Persisted action at 14
I0308 11:39:32.324373 31421 replica.cpp:658] Replica received learned notice 
for position 14
I0308 11:39:32.324522 31421 leveldb.cpp:343] Persisting action (18 bytes) to 
leveldb took 130447ns
I0308 11:39:32.324539 31421 leveldb.cpp:401] Deleting ~2 keys from leveldb took 
7410ns
I0308 11:39:32.324545 31421 replica.cpp:679] Persisted action at 14
I0308 11:39:32.324551 31421 replica.cpp:664] Replica learned TRUNCATE action at 
position 14
I0308 11:39:32.324636 31417 log.cpp:685] Attempting to append 1052 bytes to the 
log
I0308 11:39:32.324827 31418 coordinator.cpp:341] Coordinator attempting to 
write APPEND action at position 15
I0308 11:39:32.324964 31420 replica.cpp:511] Replica received write request for 
position 15
I0308 11:39:32.325119 31420 leveldb.cpp:343] Persisting action (1071 bytes) to 
leveldb took 140209ns
I0308 11:39:32.325132 31420 replica.cpp:679] Persisted action at 15
I0308 11:39:32.352208 31420 replica.cpp:658] Replica received learned notice 
for position 15
I0308 11:39:32.355242 31420 leveldb.cpp:343] Persisting action (1073 bytes) to 
leveldb took 2.995861ms
I0308 11:39:32.355274 31420 replica.cpp:679] Persisted action at 15
I0308 11:39:32.355280 31420 replica.cpp:664] Replica learned APPEND action at 
position 15
I0308 11:39:32.355592 31419 registrar.cpp:486] Successfully updated the 
'registry' in 32.098048ms
I0308 11:39:32.355640 31417 log.cpp:704] Attempting to truncate the log to 15
I0308 11:39:32.355690 31418 coordinator.cpp:341] Coordinator attempting to 
write TRUNCATE action at position 16
I0308 11:39:32.355816 31423 replica.cpp:511] Replica received write request for 
position 16
I0308 11:39:32.355866 31422 master.cpp:3930] Registered slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S5 at slave(1)@10.195.30.138:5051 
(10.195.30.138) with ports(*):[80-80, 443-443, 31000-33000]; cpus(*):8; 
mem(*):30791; disk(*):46051
I0308 11:39:32.355871 31418 hierarchical.hpp:675] Added slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S5 (10.195.30.138) with ports(*):[80-80, 
443-443, 31000-33000]; cpus(*):8; mem(*):30791; disk(*):46051 (allocated: )
I0308 11:39:32.356101 31423 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 256990ns
I0308 11:39:32.356132 31423 replica.cpp:679] Persisted action at 16
I0308 11:39:32.356706 31420 master.cpp:4272] Received update of slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S5 at slave(1)@10.195.30.138:5051 
(10.195.30.138) with total oversubscribed resources 
I0308 11:39:32.356724 31421 replica.cpp:658] Replica received learned notice 
for position 16
I0308 11:39:32.356751 31420 hierarchical.hpp:735] Slave 
87b609e3-1591-48da-b7d0-19e77c63e8a7-S5 (10.195.30.138) updated with 
oversubscribed resources  (total: ports(*):[80-80, 443-443, 31000-33000]; 
cpus(*):8; mem(*):30791; disk(*):46051, allocated: ports(*):[80-80, 443-443, 
31000-33000]; cpus(*):8; mem(*):30791; disk(*):46051)
I0308 11:39:32.356995 31421 leveldb.cpp:343] Persisting action (18 bytes) to 
leveldb took 255520ns
I0308 11:39:32.357013 31421 leveldb.cpp:401] Deleting ~2 keys from leveldb took 
7630ns
I0308 11:39:32.357019 31421 replica.cpp:679] Persisted action at 16
I0308 11:39:32.357024 31421 replica.cpp:664] Replica learned TRUNCATE action at 
position 16
{noformat}
\\
\\
5- Mesos cluster is back to normal. BUT 
* a new docker container executing task B is running on another slave of the 
cluster.
* we still have the docker container of task B running on the host of slave S0, 
*which is not anymore tracked by mesos*.


> Destroy Docker container from Marathon kills Mesos slave
> --------------------------------------------------------
>
>                 Key: MESOS-4827
>                 URL: https://issues.apache.org/jira/browse/MESOS-4827
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker, framework, slave
>    Affects Versions: 0.25.0
>            Reporter: Zhenzhong Shi
>
> The details of this issue originally [posted on 
> StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
>  
> To be short, the problem is when we destroy/re-deploy a docker-containerized 
> task, the mesos-slave got killed from time to time. It happened on our 
> production environment and I cann't re-produce it.
> Please refer to the post on StackOverflow about the error message I got and 
> details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4827) Destroy Docker container from Marathon kills Mesos slave

Reply via email to