----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10951/#review20197 -----------------------------------------------------------
src/master/master.hpp <https://reviews.apache.org/r/10951/#comment41445> we prefer shorter names for functions. i think the signature here gives a hint that it is being removed from the specified slave. src/master/master.cpp <https://reviews.apache.org/r/10951/#comment41446> we don't want to remove all the tasks on this slave! only those that belong to this framework. so, foreachvalue (Task* task, utils::copy(slave->tasks)) { // Remove the task if it belongs to the framework // being removed. if (task->framework_id() == framework->id) { ... ... } } - Vinod Kone On May 6, 2013, 5:29 p.m., Brenden Matthews wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10951/ > ----------------------------------------------------------- > > (Updated May 6, 2013, 5:29 p.m.) > > > Review request for mesos. > > > Description > ------- > > From d5576303ecaaf3c02eba082c8d5b6cf483e36dae Mon Sep 17 00:00:00 2001 > From: Brenden Matthews <[email protected]> > Date: Mon, 6 May 2013 09:54:03 -0700 > Subject: [PATCH] Terminate correct tasks when a slave disconnects. > > Previously, when a slave disconnected all tasks for that framework would > be removed and it would result in a bad state for a given framework. In > the case of Hadoop, it would result in a bunch of zombie tasks running > on the slaves which never terminate. > --- > src/master/master.cpp | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > > Below is a sample of what the Mesos master log looks like: > > > I0506 03:01:21.188874 2639 master.cpp:445] Slave > 201305040040-3141079306-5050-1068-21(i-ced4aba2) disconnected > I0506 03:01:21.189184 2639 master.cpp:464] Removing non-checkpointing > framework 201305040040-4196536586-5050-1124-0000 from disconn > ected slave 201305040040-3141079306-5050-1068-21(i-ced4aba2) > I0506 03:01:21.190471 2639 master.hpp:295] Removing task Task_Tracker_46 > with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, > 32000-32000] on slave 201305040040-4196536586-5050-1124-3 > I0506 03:01:21.190891 2632 hierarchical_allocator_process.hpp:544] Recovered > cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total > allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=763224) on slave > 201305040040-4196536586-5050-1124-3 from framework > 201305040040-4196536586-5050-1124-0000 > I0506 03:01:21.191614 2639 master.hpp:295] Removing task Task_Tracker_154 > with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, > 32000-32000] on slave 201305040040-3141079306-5050-1068-38 > I0506 03:01:21.192049 2634 hierarchical_allocator_process.hpp:544] Recovered > cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total > allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=761189) on slave > 201305040040-3141079306-5050-1068-38 from framework > 201305040040-4196536586-5050-1124-0000 > I0506 03:01:21.192828 2639 master.hpp:295] Removing task Task_Tracker_195 > with resources cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, > 31001-31001] on slave 201305040040-3141079306-5050-1068-85 > I0506 03:01:21.193270 2640 hierarchical_allocator_process.hpp:544] Recovered > cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, 31001-31001] (total > allocatable: cpus=10; mem=13408.8; ports=[31001-31999]; disk=596893) on slave > 201305040040-3141079306-5050-1068-85 from framework > 201305040040-4196536586-5050-1124-0000 > I0506 03:01:21.194039 2639 master.hpp:295] Removing task Task_Tracker_182 > with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, > 32000-32000] on slave 201305040040-3141079306-5050-1068-45 > I0506 03:01:21.194425 2638 hierarchical_allocator_process.hpp:544] Recovered > cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total > allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=760196) on slave > 201305040040-3141079306-5050-1068-45 from framework > 201305040040-4196536586-5050-1124-0000 > I0506 03:01:21.195190 2639 master.hpp:295] Removing task Task_Tracker_58 > with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, > 32000-32000] on slave 201305040040-3141079306-5050-1068-76 > I0506 03:01:21.195636 2636 hierarchical_allocator_process.hpp:544] Recovered > cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total > allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=761175) on slave > 201305040040-3141079306-5050-1068-76 from framework > 201305040040-4196536586-5050-1124-0000 > I0506 03:01:21.196455 2639 master.hpp:295] Removing task Task_Tracker_160 > with resources cpus=20; mem=40960; disk=163840; ports=[31000-31000, > 32000-32000] on slave 201305040040-3141079306-5050-1068-85 > I0506 03:01:21.196883 2631 hierarchical_allocator_process.hpp:544] Recovered > cpus=20; mem=40960; disk=163840; ports=[31000-31000, 32000-32000] (total > allocatable: cpus=30; mem=54368.8; ports=[31000-32000]; disk=760733) on slave > 201305040040-3141079306-5050-1068-85 from framework > 201305040040-4196536586-5050-1124-0000 > I0506 03:01:21.197710 2639 master.hpp:295] Removing task Task_Tracker_96 > with resources cpus=3.5; mem=7168; disk=28672; ports=[31000-31000, > 32000-32000] on slave 201305040040-3141079306-5050-1068-80 > <...log continues...> > > > Diffs > ----- > > src/master/master.hpp d3790dc > src/master/master.cpp 3207157 > > Diff: https://reviews.apache.org/r/10951/diff/ > > > Testing > ------- > > Used in production at airbnb. > > > Thanks, > > Brenden Matthews > >
