----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10951/ -----------------------------------------------------------
(Updated May 6, 2013, 8:08 p.m.) Review request for mesos. Description ------- >From d01482457f02acc1e19195995db7a14dfc2a89b9 Mon Sep 17 00:00:00 2001 From: Brenden Matthews <[email protected]> Date: Mon, 6 May 2013 09:54:03 -0700 Subject: [PATCH] Terminate correct tasks when a slave disconnects. Previously, when a slave disconnected all tasks for that framework would be removed and it would result in a bad state for a given framework. In the case of Hadoop, it would result in a bunch of zombie tasks running on the slaves which never terminate. Added some `operator !=' type utilities. --- src/common/type_utils.hpp | 66 +++++++++++++++++++++++++++++++++++++++++++++ src/master/master.cpp | 8 ++++-- 2 files changed, 72 insertions(+), 2 deletions(-) Below is a sample of what the Mesos master log looks like: I0506 03:01:21.188874 2639 master.cpp:445] Slave 201305040040-3141079306-5050-1068-21(i-ced4aba2) disconnected I0506 03:01:21.189184 2639 master.cpp:464] Removing non-checkpointing framework 201305040040-4196536586-5050-1124-0000 from disconn ected slave 201305040040-3141079306-5050-1068-21(i-ced4aba2) I0506 03:01:21.190471 2639 master.hpp:295] Removing task Task_Tracker_46 with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on slave 201305040040-4196536586-5050-1124-3 I0506 03:01:21.190891 2632 hierarchical_allocator_process.hpp:544] Recovered cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=763224) on slave 201305040040-4196536586-5050-1124-3 from framework 201305040040-4196536586-5050-1124-0000 I0506 03:01:21.191614 2639 master.hpp:295] Removing task Task_Tracker_154 with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-38 I0506 03:01:21.192049 2634 hierarchical_allocator_process.hpp:544] Recovered cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=761189) on slave 201305040040-3141079306-5050-1068-38 from framework 201305040040-4196536586-5050-1124-0000 I0506 03:01:21.192828 2639 master.hpp:295] Removing task Task_Tracker_195 with resources cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, 31001-31001] on slave 201305040040-3141079306-5050-1068-85 I0506 03:01:21.193270 2640 hierarchical_allocator_process.hpp:544] Recovered cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, 31001-31001] (total allocatable: cpus=10; mem=13408.8; ports=[31001-31999]; disk=596893) on slave 201305040040-3141079306-5050-1068-85 from framework 201305040040-4196536586-5050-1124-0000 I0506 03:01:21.194039 2639 master.hpp:295] Removing task Task_Tracker_182 with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-45 I0506 03:01:21.194425 2638 hierarchical_allocator_process.hpp:544] Recovered cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=760196) on slave 201305040040-3141079306-5050-1068-45 from framework 201305040040-4196536586-5050-1124-0000 I0506 03:01:21.195190 2639 master.hpp:295] Removing task Task_Tracker_58 with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-76 I0506 03:01:21.195636 2636 hierarchical_allocator_process.hpp:544] Recovered cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=761175) on slave 201305040040-3141079306-5050-1068-76 from framework 201305040040-4196536586-5050-1124-0000 I0506 03:01:21.196455 2639 master.hpp:295] Removing task Task_Tracker_160 with resources cpus=20; mem=40960; disk=163840; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-85 I0506 03:01:21.196883 2631 hierarchical_allocator_process.hpp:544] Recovered cpus=20; mem=40960; disk=163840; ports=[31000-31000, 32000-32000] (total allocatable: cpus=30; mem=54368.8; ports=[31000-32000]; disk=760733) on slave 201305040040-3141079306-5050-1068-85 from framework 201305040040-4196536586-5050-1124-0000 I0506 03:01:21.197710 2639 master.hpp:295] Removing task Task_Tracker_96 with resources cpus=3.5; mem=7168; disk=28672; ports=[31000-31000, 32000-32000] on slave 201305040040-3141079306-5050-1068-80 <...log continues...> Diffs (updated) ----- src/master/master.cpp 3207157 Diff: https://reviews.apache.org/r/10951/diff/ Testing ------- Used in production at airbnb. Thanks, Brenden Matthews
