[ 
https://issues.apache.org/jira/browse/MESOS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1517:
---------------------------------
    Labels: reliability twitter  (was: reliability)

> Maintain a queue of messages that arrive before the master recovers.
> --------------------------------------------------------------------
>
>                 Key: MESOS-1517
>                 URL: https://issues.apache.org/jira/browse/MESOS-1517
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Benjamin Mahler
>              Labels: reliability, twitter
>
> Currently when the master is recovering, we drop all incoming messages. If 
> slaves and frameworks knew about the leading master only once it has 
> recovered, then we would only expect to see messages after we've recovered.
> We previously considered enqueuing all messages through the recovery future, 
> but this has the downside of forcing all messages to go through the master's 
> queue twice:
> {code}
>   // TODO(bmahler): Consider instead re-enqueing *all* messages
>   // through recover(). What are the performance implications of
>   // the additional queueing delay and the accumulated backlog
>   // of messages post-recovery?
>   if (!recovered.get().isReady()) {
>     VLOG(1) << "Dropping '" << event.message->name << "' message since "
>             << "not recovered yet";
>     ++metrics.dropped_messages;
>     return;
>   }
> {code}
> However, an easy solution to this problem is to maintain an explicit queue of 
> incoming messages that gets flushed once we finish recovery. This ensures 
> that all messages post-recovery are processed normally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to