Re: Review Request 29507: Added Configurable Slave Ping Timeouts

Ben Mahler Fri, 09 Jan 2015 15:40:59 -0800


> On Jan. 7, 2015, 10:50 p.m., Ben Mahler wrote:
> > src/master/flags.hpp, lines 361-373
> > <https://reviews.apache.org/r/29507/diff/1/?file=804645#file804645line361>
> >
> >     Do you mind changing the names now that they're merely defaults?
> >     
> >     s/SLAVE_PING_TIMEOUT/DEFAULT_SLAVE_PING_TIMEOUT/
> >     s/MAX_SLAVE_PING_TIMEOUTS/DEFAULT_MAX_SLAVE_PING_TIMEOUTS/
> >     
> >     Once you do this, it will become apparent there is more needed in this 
> > change. In particular, all the slave logic that depends on 
> > MASTER_PING_TIMEOUT() is broken now! Making it configurable will require 
> > some more thought.
> 
> Adam B wrote:
>     Renamed to make the DEFAULT nature explicit.
>     
>     I would argue that with this change as is, the slave side of the ping 
> timeout isn't much more broken than it already was. As I alluded to in the 
> MESOS-2110 JIRA description, the slave uses the timeout to force a 
> leader-detection & re-registration, which is only loosely related to what the 
> master is doing with the timeout (declaring the slave shutdown). Even without 
> my changes, the Timers on master/slave could be out of sync and the two 
> timeout events aren't guaranteed to happen in a particular order. If the 
> master were to timeout first, and the slave did eventually detect a new 
> leader, the slave would still be told to shutdown and its re-registration 
> would fail. If the slave timed out first, it would try to re-register with 
> the currently verified leader (could be the same), and the master's 
> SlaveObserver would keep trying to ping the slave and potentially eventually 
> shut it down. That's why I would argue that this patch is fine on its own, 
> and a subsequent patch (same JIRA?) sh
 ould clean up the slave's ping timeout behavior.
>     
>     `src/master/constants.hpp` has the following note:
>     ```
>     // NOTE: The slave uses these PING constants to determine when
>     // the master has stopped sending pings. If these are made
>     // configurable, then we'll need to rely on upper/lower bounds
>     // to ensure that the slave is not unnecessarily triggering
>     // re-registrations.
>     ```
>     If the master flags are configurable, there's currently no way for the 
> slave to know what value the current leading master is using, especially if 
> the slave is disconnected at the time. After reading through the comments in 
> MESOS-1529 (which added the slave-side ping timeout), I'm thinking that maybe 
> the master->slave registered (and PING?) messages could embed the current 
> master's actual ping timeout. Then the pingTimers started from 
> Slave::registered() and Slave::ping() could use that value (same as master's) 
> to prevent unnecessary reregistration due to a timeout mismatch.
>     
>     Thoughts, suggestions?

> I would argue that with this change as is, the slave side of the ping timeout 
> isn't much more broken than it already was.

Hm.. I think you misunderstood the intent of the slave using this timeout (it's 
not trying to prevent the things you discussed, please clarify if you think 
it's causing some issues, I didn't quite understand what you're pointing out).

These timers are in place to guarantee that if the master removes a slave 
(stops health checking it), then the slave will eventually terminate. Without 
this timeout, it's possible for the master -> slave removal messages to drop 
while the slave -> master socket remains intact. At this point, (without the 
timeout) the slave would remain running potentially forever (unless a message 
(e.g. status update) is sent to the master for the master to detect the zombied 
slave).

The slave also uses the pings to prevent a slave from getting "stuck" in a 
situation where the master considers the slave disconnected, but the slave 
considers itself connected. This can happen via a one-way socket closure from 
master -> slave. Note that the health checking in the master continues while 
the slave is disconnected.

The comments here should help:

```
void Slave::ping(const UPID& from, bool connected)
{
  VLOG(1) << "Received ping from " << from;

  if (!connected && state == RUNNING) {
    // This could happen if there is a one way partition between
    // the master and slave, causing the master to get an exited
    // event and marking the slave disconnected but the slave
    // thinking it is still connected. Force a re-registration with
    // the master to reconcile.
    LOG(INFO) << "Master marked the slave as disconnected but the slave"
              << " considers itself registered! Forcing re-registration.";
    detection.discard();
  }

  // If we don't get a ping from the master, trigger a
  // re-registration. This can occur when the master no
  // longer considers the slave to be registered, so it is
  // essential for the slave to attempt a re-registration
  // when this occurs.
  Clock::cancel(pingTimer);

  pingTimer = delay(
      MASTER_PING_TIMEOUT(),
      self(),
      &Slave::pingTimeout,
      detection);

  send(from, PongSlaveMessage());
}
```

Based on all of this, it's really important to understand the implications of 
the slave under and over estimating the timeout on the master side. For 
example, if the slave under-estimates and always triggers re-registration, that 
would be problematic on large clusters (40k slaves / 75 seconds = 500+ 
re-registrations / second in the steady state!). That's why a TODO mentioned 
using upper/lower bounds for this.

- Ben

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29507/#review67113
-----------------------------------------------------------

On Jan. 8, 2015, 8:44 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29507/
> -----------------------------------------------------------
> 
> (Updated Jan. 8, 2015, 8:44 a.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2110
>     https://issues.apache.org/jira/browse/MESOS-2110
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added new --slave_ping_timeout and --max_slave_ping_timeouts flags
> to mesos-master to replace the existing (still default) 
> SLAVE_PING_TIMEOUT (15secs) and MAX_SLAVE_PING_TIMEOUTS (5).
>    
> These can be extended if slaves are expected/allowed to be down for
> longer than a minute or two.
>   
> Beware that this affects recovery from network timeouts as well as
> actual slave node/process failover.
> 
> 
> Diffs
> -----
> 
>   src/master/constants.hpp c386eab 
>   src/master/constants.cpp 9ee17e9 
>   src/master/flags.hpp f5c8d2a 
>   src/master/master.cpp d6651e2 
>   src/slave/constants.hpp fd1c1ab 
>   src/slave/constants.cpp 2a99b11 
>   src/slave/slave.cpp 50b5781 
>   src/tests/fault_tolerance_tests.cpp 5763486 
>   src/tests/partition_tests.cpp fea7801 
>   src/tests/slave_recovery_tests.cpp cd4a398 
>   src/tests/slave_tests.cpp c50cbc7 
> 
> Diff: https://reviews.apache.org/r/29507/diff/
> 
> 
> Testing
> -------
> 
> Manually tested slave failover/shutdown with master using different 
> --slave_ping_timeout and --max_slave_ping_timeouts.
> Updated unit tests to use shorter non-default values for ping timeouts.
> 
> 
> Thanks,
> 
> Adam B
> 
>

Re: Review Request 29507: Added Configurable Slave Ping Timeouts

Reply via email to