-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10951/
-----------------------------------------------------------

(Updated May 6, 2013, 5:29 p.m.)


Review request for mesos.


Description
-------

>From d5576303ecaaf3c02eba082c8d5b6cf483e36dae Mon Sep 17 00:00:00 2001
From: Brenden Matthews <[email protected]>
Date: Mon, 6 May 2013 09:54:03 -0700
Subject: [PATCH] Terminate correct tasks when a slave disconnects.

Previously, when a slave disconnected all tasks for that framework would
be removed and it would result in a bad state for a given framework.  In
the case of Hadoop, it would result in a bunch of zombie tasks running
on the slaves which never terminate.
---
 src/master/master.cpp |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)


Below is a sample of what the Mesos master log looks like:


I0506 03:01:21.188874  2639 master.cpp:445] Slave 
201305040040-3141079306-5050-1068-21(i-ced4aba2) disconnected
I0506 03:01:21.189184  2639 master.cpp:464] Removing non-checkpointing 
framework 201305040040-4196536586-5050-1124-0000 from disconn
ected slave 201305040040-3141079306-5050-1068-21(i-ced4aba2)
I0506 03:01:21.190471  2639 master.hpp:295] Removing task Task_Tracker_46 with 
resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on 
slave 201305040040-4196536586-5050-1124-3
I0506 03:01:21.190891  2632 hierarchical_allocator_process.hpp:544] Recovered 
cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total 
allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=763224) on slave 
201305040040-4196536586-5050-1124-3 from framework 
201305040040-4196536586-5050-1124-0000
I0506 03:01:21.191614  2639 master.hpp:295] Removing task Task_Tracker_154 with 
resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on 
slave 201305040040-3141079306-5050-1068-38
I0506 03:01:21.192049  2634 hierarchical_allocator_process.hpp:544] Recovered 
cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total 
allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=761189) on slave 
201305040040-3141079306-5050-1068-38 from framework 
201305040040-4196536586-5050-1124-0000
I0506 03:01:21.192828  2639 master.hpp:295] Removing task Task_Tracker_195 with 
resources cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, 31001-31001] on 
slave 201305040040-3141079306-5050-1068-85
I0506 03:01:21.193270  2640 hierarchical_allocator_process.hpp:544] Recovered 
cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, 31001-31001] (total 
allocatable: cpus=10; mem=13408.8; ports=[31001-31999]; disk=596893) on slave 
201305040040-3141079306-5050-1068-85 from framework 
201305040040-4196536586-5050-1124-0000
I0506 03:01:21.194039  2639 master.hpp:295] Removing task Task_Tracker_182 with 
resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on 
slave 201305040040-3141079306-5050-1068-45
I0506 03:01:21.194425  2638 hierarchical_allocator_process.hpp:544] Recovered 
cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total 
allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=760196) on slave 
201305040040-3141079306-5050-1068-45 from framework 
201305040040-4196536586-5050-1124-0000
I0506 03:01:21.195190  2639 master.hpp:295] Removing task Task_Tracker_58 with 
resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on 
slave 201305040040-3141079306-5050-1068-76
I0506 03:01:21.195636  2636 hierarchical_allocator_process.hpp:544] Recovered 
cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total 
allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=761175) on slave 
201305040040-3141079306-5050-1068-76 from framework 
201305040040-4196536586-5050-1124-0000
I0506 03:01:21.196455  2639 master.hpp:295] Removing task Task_Tracker_160 with 
resources cpus=20; mem=40960; disk=163840; ports=[31000-31000, 32000-32000] on 
slave 201305040040-3141079306-5050-1068-85
I0506 03:01:21.196883  2631 hierarchical_allocator_process.hpp:544] Recovered 
cpus=20; mem=40960; disk=163840; ports=[31000-31000, 32000-32000] (total 
allocatable: cpus=30; mem=54368.8; ports=[31000-32000]; disk=760733) on slave 
201305040040-3141079306-5050-1068-85 from framework 
201305040040-4196536586-5050-1124-0000
I0506 03:01:21.197710  2639 master.hpp:295] Removing task Task_Tracker_96 with 
resources cpus=3.5; mem=7168; disk=28672; ports=[31000-31000, 32000-32000] on 
slave 201305040040-3141079306-5050-1068-80
<...log continues...>


Diffs (updated)
-----

  src/master/master.hpp d3790dc 
  src/master/master.cpp 3207157 

Diff: https://reviews.apache.org/r/10951/diff/


Testing
-------

Used in production at airbnb.


Thanks,

Brenden Matthews

Reply via email to