[
https://issues.apache.org/jira/browse/MESOS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Mahler updated MESOS-7748:
-----------------------------------
Summary: Slow subscribers of streaming APIs can lead to Mesos master
OOM event. (was: Streaming API subscribers can lead to Mesos master OOM event.)
Description:
For each active subscriber, Mesos master / slave maintains an event queue,
which grows over time if the subscriber does not read fast enough. As the
number of such "slow" subscribers grows, so does Mesos master / slave memory
consumption, which might lead to an OOM event.
Ideas to consider:
* Restrict the number of subscribers for the streaming APIs
* Check (ping) for inactive or "slow" subscribers
* Disconnect the subscriber when there are too many queued events in memory
was:
For each active subscriber, Mesos master maintains an event queue, which grows
over time if the subscriber does not read fast enough. As the number of such
"slow" subscribers grows, so does Mesos master memory consumption, which might
lead to an OOM event.
Ideas to consider:
* Restrict the number of subscribers for the streaming API
* Check (ping) for inactive or "slow" subscribers
* Disconnect the subscriber when there are too many queued events in memory
Edited the ticket to reflect that this problem is present on all streaming
APIs, which includes the master's operator and scheduler APIs, as well as the
agent's streaming API for getting container stdout/stderr. Any others I'm
missing?
> Slow subscribers of streaming APIs can lead to Mesos master OOM event.
> ----------------------------------------------------------------------
>
> Key: MESOS-7748
> URL: https://issues.apache.org/jira/browse/MESOS-7748
> Project: Mesos
> Issue Type: Bug
> Reporter: Alexander Rukletsov
> Assignee: Alexander Rukletsov
> Priority: Critical
> Labels: mesosphere, reliability
>
> For each active subscriber, Mesos master / slave maintains an event queue,
> which grows over time if the subscriber does not read fast enough. As the
> number of such "slow" subscribers grows, so does Mesos master / slave memory
> consumption, which might lead to an OOM event.
> Ideas to consider:
> * Restrict the number of subscribers for the streaming APIs
> * Check (ping) for inactive or "slow" subscribers
> * Disconnect the subscriber when there are too many queued events in memory
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)