[ 
https://issues.apache.org/jira/browse/MESOS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394584#comment-15394584
 ] 

Steven Schlansker commented on MESOS-5910:
------------------------------------------

It seems that it actually gives you a current snapshot when you initially 
subscribe, so perhaps this really is only an issue during master failovers.  So 
this is probably of somewhat lower importance than I thought, although 
correctly handling master failover without losing events is still desirable.

> Operator SUBSCRIBE api should provide a method to get all events without 
> requiring 100% uptime
> ----------------------------------------------------------------------------------------------
>
>                 Key: MESOS-5910
>                 URL: https://issues.apache.org/jira/browse/MESOS-5910
>             Project: Mesos
>          Issue Type: Improvement
>          Components: HTTP API, json api
>    Affects Versions: 1.0.0
>            Reporter: Steven Schlansker
>
> The v1.0 Operator API adds a new SUBSCRIBE call, which returns a stream of 
> events as they occur.  This is going to be extremely useful for monitoring 
> and management jobs, as they can now have timely information about Mesos's 
> operation without requiring repeated polling or other ugly solutions.
> Unfortunately, the SUBSCRIBE call always returns from the time the call is 
> made.  This means that any consumer cannot reliably subscribe to "all 
> events"; if the application goes offline (network blip, code upgrade, etc) 
> all events during that downtime are lost.
> You could instead have a cluster of applications receiving the events and 
> coordinating to deduplicate them to increase reliability, but this pushes a 
> lot of complexity into clients, and I suspect most users would not do this 
> correctly and would potentially lose events.
> It would be extremely useful for a single client to be able to get a reliable 
> event stream without requiring a single HTTP connection to be 100% available.
> One possible solution is to assign every event an ID.  Then, extend the API 
> to take a "start position" in the log.  The API immediately streams out all 
> events from the start event up until the tail of the log, and then continues 
> emitting new events are they occur.  This provides a reliable way for a 
> consumer to get "at least once" semantics on events.  The caveat is that the 
> consumer may only be down for as long as the master retains event history, 
> but this is a much easier pill to swallow.  This is similar to etcd's "watch" 
> api, if you are looking for an actual implementation to reference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to