[
https://issues.apache.org/jira/browse/MESOS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038988#comment-14038988
]
Dominic Hamon commented on MESOS-1517:
--------------------------------------
This seems like a worthwhile change.
There will be a subtle change in behaviour for message senders. Currently,
senders will either get a response within a certain time (if the master is
recovered) or no response at all. With this change, senders will always* get a
response but it may take longer if recovery is in progress. I doubt that any
senders currently rely on this behaviour (they shouldn't).
* for certain definitions of always
> Maintain a queue of messages that arrive before the master recovers.
> --------------------------------------------------------------------
>
> Key: MESOS-1517
> URL: https://issues.apache.org/jira/browse/MESOS-1517
> Project: Mesos
> Issue Type: Improvement
> Components: master
> Reporter: Benjamin Mahler
> Labels: reliability
> Fix For: 0.19.0
>
>
> Currently when the master is recovering, we drop all incoming messages. If
> slaves and frameworks knew about the leading master only once it has
> recovered, then we would only expect to see messages after we've recovered.
> We previously considered enqueuing all messages through the recovery future,
> but this has the downside of forcing all messages to go through the master's
> queue twice:
> {code}
> // TODO(bmahler): Consider instead re-enqueing *all* messages
> // through recover(). What are the performance implications of
> // the additional queueing delay and the accumulated backlog
> // of messages post-recovery?
> if (!recovered.get().isReady()) {
> VLOG(1) << "Dropping '" << event.message->name << "' message since "
> << "not recovered yet";
> ++metrics.dropped_messages;
> return;
> }
> {code}
> However, an easy solution to this problem is to maintain an explicit queue of
> incoming messages that gets flushed once we finish recovery. This ensures
> that all messages post-recovery are processed normally.
--
This message was sent by Atlassian JIRA
(v6.2#6252)