[
https://issues.apache.org/jira/browse/MESOS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Wu updated MESOS-5723:
-----------------------------
Shepherd: Joris Van Remoortere
> SSL-enabled libprocess will leak incoming links to forks
> --------------------------------------------------------
>
> Key: MESOS-5723
> URL: https://issues.apache.org/jira/browse/MESOS-5723
> Project: Mesos
> Issue Type: Bug
> Components: libprocess
> Affects Versions: 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
> Reporter: Joseph Wu
> Assignee: Joseph Wu
> Priority: Blocker
> Labels: libprocess, mesosphere, ssl
> Fix For: 1.0.0
>
>
> Encountered two different buggy behaviors that can be tracked down to the
> same underlying problem.
> Repro #1 (non-crashy):
> (1) Start a master. Doesn't matter if SSL is enabled or not.
> (2) Start an agent, with SSL enabled. Downgrade support has the same
> problem. The master/agent {{link}} to one another.
> (3) Run a sleep task. Keep this alive. If you inspect FDs at this point,
> you'll notice the task has inherited the {{link}} FD (master -> agent).
> (4) Restart the agent. Due to (3), the master's {{link}} stays open.
> (5) Check master's logs for the agent's re-registration message.
> (6) Check the agent's logs for re-registration. The message will not appear.
> The master is actually using the old {{link}} which is not connected to the
> agent.
> ----
> Repro #2 (crashy):
> (1) Start a master. Doesn't matter if SSL is enabled or not.
> (2) Start an agent, with SSL enabled. Downgrade support has the same problem.
> (3) Run ~100 sleep task one after the other, keep them all alive. Each task
> links back to the agent. Due to an FD leak, each task will inherit the
> incoming links from all other actors...
> (4) At some point, the agent will run out of FDs and kernel panic.
> ----
> It appears that the SSL socket {{accept}} call is missing {{os::nonblock}}
> and {{os::cloexec}} calls:
> https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L794-L806
> For reference, here's {{poll}} socket's {{accept}}:
> https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/poll_socket.cpp#L53-L75
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)