[ 
https://issues.apache.org/jira/browse/MESOS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7748:
-----------------------------------
        Summary: Slow subscribers of streaming APIs can lead to Mesos master 
OOM event.  (was: Streaming API subscribers can lead to Mesos master OOM event.)
    Description: 
For each active subscriber, Mesos master / slave maintains an event queue, 
which grows over time if the subscriber does not read fast enough. As the 
number of such "slow" subscribers grows, so does Mesos master / slave memory 
consumption, which might lead to an OOM event.

Ideas to consider:
* Restrict the number of subscribers for the streaming APIs
* Check (ping) for inactive or "slow" subscribers
* Disconnect the subscriber when there are too many queued events in memory

  was:
For each active subscriber, Mesos master maintains an event queue, which grows 
over time if the subscriber does not read fast enough. As the number of such 
"slow" subscribers grows, so does Mesos master memory consumption, which might 
lead to an OOM event.

Ideas to consider:
* Restrict the number of subscribers for the streaming API
* Check (ping) for inactive or "slow" subscribers
* Disconnect the subscriber when there are too many queued events in memory


Edited the ticket to reflect that this problem is present on all streaming 
APIs, which includes the master's operator and scheduler APIs, as well as the 
agent's streaming API for getting container stdout/stderr. Any others I'm 
missing?

> Slow subscribers of streaming APIs can lead to Mesos master OOM event.
> ----------------------------------------------------------------------
>
>                 Key: MESOS-7748
>                 URL: https://issues.apache.org/jira/browse/MESOS-7748
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Alexander Rukletsov
>            Assignee: Alexander Rukletsov
>            Priority: Critical
>              Labels: mesosphere, reliability
>
> For each active subscriber, Mesos master / slave maintains an event queue, 
> which grows over time if the subscriber does not read fast enough. As the 
> number of such "slow" subscribers grows, so does Mesos master / slave memory 
> consumption, which might lead to an OOM event.
> Ideas to consider:
> * Restrict the number of subscribers for the streaming APIs
> * Check (ping) for inactive or "slow" subscribers
> * Disconnect the subscriber when there are too many queued events in memory



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to