[
https://issues.apache.org/jira/browse/MESOS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127202#comment-16127202
]
Alexander Rukletsov commented on MESOS-7748:
--------------------------------------------
Based on the comment above, I suggest we do the following:
(1) Terminate write-stalled connection on (1) a timeout and / or (2) if the
socket queue overflows;
Neither (1) nor (2) protects from a client that does read, but cannot keep up
with the rate data is sent.
(3) Block stream write if the socket queue overflows or (4) propagate blocking
write to user code, i.e., block all writes if the socket queue overflows.
> Slow subscribers of streaming APIs can lead to Mesos OOMing.
> ------------------------------------------------------------
>
> Key: MESOS-7748
> URL: https://issues.apache.org/jira/browse/MESOS-7748
> Project: Mesos
> Issue Type: Bug
> Reporter: Alexander Rukletsov
> Assignee: Alexander Rukletsov
> Priority: Critical
> Labels: mesosphere, reliability
>
> For each active subscriber, Mesos master / slave maintains an event queue,
> which grows over time if the subscriber does not read fast enough. As the
> number of such "slow" subscribers grows, so does Mesos master / slave memory
> consumption, which might lead to an OOM event.
> Ideas to consider:
> * Restrict the number of subscribers for the streaming APIs
> * Check (ping) for inactive or "slow" subscribers
> * Disconnect the subscriber when there are too many queued events in memory
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)