-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10951/#review20194
-----------------------------------------------------------



src/master/master.cpp
<https://reviews.apache.org/r/10951/#comment41437>

    thank you.



src/master/master.cpp
<https://reviews.apache.org/r/10951/#comment41442>

    Wow. This is really a bug. Thanks for catching this!
    
    I think a better way to do this, is to change the foreach loop (#1776) to 
loop through the slave's tasks instead of framework's tasks (which can be 
huge!). Inside the for loop we can check if the task belongs to the removing 
framework or not. Makes sense?
    
    Also, we always use braces around if/for statements.
    


- Vinod Kone


On May 6, 2013, 4:59 p.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10951/
> -----------------------------------------------------------
> 
> (Updated May 6, 2013, 4:59 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> From d5576303ecaaf3c02eba082c8d5b6cf483e36dae Mon Sep 17 00:00:00 2001
> From: Brenden Matthews <[email protected]>
> Date: Mon, 6 May 2013 09:54:03 -0700
> Subject: [PATCH] Terminate correct tasks when a slave disconnects.
> 
> Previously, when a slave disconnected all tasks for that framework would
> be removed and it would result in a bad state for a given framework.  In
> the case of Hadoop, it would result in a bunch of zombie tasks running
> on the slaves which never terminate.
> ---
>  src/master/master.cpp |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> 
> Below is a sample of what the Mesos master log looks like:
> 
> 
> I0506 03:01:21.188874  2639 master.cpp:445] Slave 
> 201305040040-3141079306-5050-1068-21(i-ced4aba2) disconnected
> I0506 03:01:21.189184  2639 master.cpp:464] Removing non-checkpointing 
> framework 201305040040-4196536586-5050-1124-0000 from disconn
> ected slave 201305040040-3141079306-5050-1068-21(i-ced4aba2)
> I0506 03:01:21.190471  2639 master.hpp:295] Removing task Task_Tracker_46 
> with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 
> 32000-32000] on slave 201305040040-4196536586-5050-1124-3
> I0506 03:01:21.190891  2632 hierarchical_allocator_process.hpp:544] Recovered 
> cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total 
> allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=763224) on slave 
> 201305040040-4196536586-5050-1124-3 from framework 
> 201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.191614  2639 master.hpp:295] Removing task Task_Tracker_154 
> with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 
> 32000-32000] on slave 201305040040-3141079306-5050-1068-38
> I0506 03:01:21.192049  2634 hierarchical_allocator_process.hpp:544] Recovered 
> cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total 
> allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=761189) on slave 
> 201305040040-3141079306-5050-1068-38 from framework 
> 201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.192828  2639 master.hpp:295] Removing task Task_Tracker_195 
> with resources cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, 
> 31001-31001] on slave 201305040040-3141079306-5050-1068-85
> I0506 03:01:21.193270  2640 hierarchical_allocator_process.hpp:544] Recovered 
> cpus=6.5; mem=13312; disk=53248; ports=[31999-31999, 31001-31001] (total 
> allocatable: cpus=10; mem=13408.8; ports=[31001-31999]; disk=596893) on slave 
> 201305040040-3141079306-5050-1068-85 from framework 
> 201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.194039  2639 master.hpp:295] Removing task Task_Tracker_182 
> with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 
> 32000-32000] on slave 201305040040-3141079306-5050-1068-45
> I0506 03:01:21.194425  2638 hierarchical_allocator_process.hpp:544] Recovered 
> cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total 
> allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=760196) on slave 
> 201305040040-3141079306-5050-1068-45 from framework 
> 201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.195190  2639 master.hpp:295] Removing task Task_Tracker_58 
> with resources cpus=9; mem=18432; disk=73728; ports=[31000-31000, 
> 32000-32000] on slave 201305040040-3141079306-5050-1068-76
> I0506 03:01:21.195636  2636 hierarchical_allocator_process.hpp:544] Recovered 
> cpus=9; mem=18432; disk=73728; ports=[31000-31000, 32000-32000] (total 
> allocatable: cpus=15; mem=19180.2; ports=[31000-32000]; disk=761175) on slave 
> 201305040040-3141079306-5050-1068-76 from framework 
> 201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.196455  2639 master.hpp:295] Removing task Task_Tracker_160 
> with resources cpus=20; mem=40960; disk=163840; ports=[31000-31000, 
> 32000-32000] on slave 201305040040-3141079306-5050-1068-85
> I0506 03:01:21.196883  2631 hierarchical_allocator_process.hpp:544] Recovered 
> cpus=20; mem=40960; disk=163840; ports=[31000-31000, 32000-32000] (total 
> allocatable: cpus=30; mem=54368.8; ports=[31000-32000]; disk=760733) on slave 
> 201305040040-3141079306-5050-1068-85 from framework 
> 201305040040-4196536586-5050-1124-0000
> I0506 03:01:21.197710  2639 master.hpp:295] Removing task Task_Tracker_96 
> with resources cpus=3.5; mem=7168; disk=28672; ports=[31000-31000, 
> 32000-32000] on slave 201305040040-3141079306-5050-1068-80
> <...log continues...>
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 3207157 
> 
> Diff: https://reviews.apache.org/r/10951/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>

Reply via email to