-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25614/#review53408
-----------------------------------------------------------

Ship it!


Ship It!

- Niklas Nielsen


On Sept. 15, 2014, 1:54 p.m., Connor Doyle wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25614/
> -----------------------------------------------------------
> 
> (Updated Sept. 15, 2014, 1:54 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-1795
>     https://issues.apache.org/jira/browse/MESOS-1795
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> ## OVERVIEW
> 
> This change adds a measure of safety for calls to `Future<T>::await()`
> without a timeout from within the JNI interface to the state
> abstraction.
> 
> ## MOTIVATION
> 
> Observed the following log output prior to a crash of the Marathon scheduler:
> 
> ```
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 
> 23:46:01.771927 11532 org_apache_mesos_state_AbstractState.cpp:145] 
> CHECK_READY(*future): is PENDING 
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure 
> stack trace: ***
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]:     @     
> 0x7febc2663a2d  google::LogMessage::Fail()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]:     @     
> 0x7febc26657e3  google::LogMessage::SendToLog()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]:     @     
> 0x7febc2663648  google::LogMessage::Flush()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]:     @     
> 0x7febc266603e  google::LogMessageFatal::~LogMessageFatal()
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]:     @     
> 0x7febc26588a3  Java_org_apache_mesos_state_AbstractState__1_1fetch_1get
> Sep 12 23:46:01 highly-available-457-540 marathon[11494]:     @     
> 0x7febcd107d98  (unknown)
> ```
> _Listing 1: Crash log output._
> 
> ## CHANGES IN DETAIL
> 
> The in-line comments in `Latch::await(const Duration& duration)`, mention
> that it's possible for the call to return before the underlying process
> has completed for two reasons:
> 
> ```
> ...
> process::wait(pid, duration); // Explict to disambiguate.
> // It's possible that we failed to wait because:
> //   (1) Our process has already terminated.
> //   (2) We timed out (i.e., duration was not "infinite").
> ...
> ```
> _Listing 2: Excerpt from 3rdparty/src/libprocess/latch.cpp._
> 
> Call sites within `org_apache_mesos_state_AbstractState.cpp`
> that formerly discarded the return value now throw a
> `java.concurrent.ExecutionException` if the await fails.  I chose to throw
> this instead of a `java.concurrent.TimeoutException` since the expectation
> from the caller's perspective is to block forever pending a ready state.
> 
> 
> Diffs
> -----
> 
>   src/java/jni/org_apache_mesos_state_AbstractState.cpp 1accc8a 
> 
> Diff: https://reviews.apache.org/r/25614/diff/
> 
> 
> Testing
> -------
> 
> - make
> - make check
> 
> 
> Thanks,
> 
> Connor Doyle
> 
>

Reply via email to