> On Jan. 24, 2013, 5:22 p.m., Charles Reiss wrote:
> > src/slave/slave.cpp, line 1084
> > <https://reviews.apache.org/r/9095/diff/1/?file=251559#file251559line1084>
> >
> >     Aren't you going to be missing the last resource sample for the 
> > executor (perhaps the only one for an, e.g., crash-looping executor)?
> 
> Charles Reiss wrote:
>     Okay, sorry, I really should have looked more at what archive() did 
> before writing that. However, I think you have a problem with the 
> frameworkId/executorId pairs not being unique over a window when an executor 
> gets restarted with the same ID (crash-loop scenario is the obvious case 
> where this is likely again).
> 
> Ben Mahler wrote:
>     Good point, there's definitely a bug here:
>     
>     -Executor 1 terminates
>     -Archive stats for Executor 1
>     -Executor 1 runs again on the same slave
>     -We collect and export resource usage to STATS for Executor 1.
>     
>     Now, Executor 1 incorrectly remains archived, and while it will show up 
> in the usage.json endpoint, it will never show up again in the statistics 
> snapshot.json.
>     The fix here is when a new statistics comes in, to ensure it's not 
> archived. I'll make that fix in https://reviews.apache.org/r/9093/
>     
>     Were there any other issues here?

I didn't see anything else broken, though I didn't look very hard.

I would have preferred/expected if statistics would be separate for separate 
executor attempts (e.g. keyed by the slave's UUID, which likely requires an 
IsolationModule API change to support), but it's not a big deal.


- Charles


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9095/#review15645
-----------------------------------------------------------


On Jan. 24, 2013, 9:22 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9095/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2013, 9:22 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> This wires up the archival of terminated executor stats.
> 
> 
> This addresses bug MESOS-324.
>     https://issues.apache.org/jira/browse/MESOS-324
> 
> 
> Diffs
> -----
> 
>   src/slave/monitor.hpp PRE-CREATION 
>   src/slave/monitor.cpp PRE-CREATION 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
> 
> Diff: https://reviews.apache.org/r/9095/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>

Reply via email to