[jira] [Assigned] (MESOS-5361) Consider introducing TCP KeepAlive for Libprocess sockets.

Ilya Pronin (JIRA) Mon, 26 Jun 2017 07:22:19 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ilya Pronin reassigned MESOS-5361:
----------------------------------

    Assignee: Ilya Pronin

> Consider introducing TCP KeepAlive for Libprocess sockets.
> ----------------------------------------------------------
>
>                 Key: MESOS-5361
>                 URL: https://issues.apache.org/jira/browse/MESOS-5361
>             Project: Mesos
>          Issue Type: Improvement
>          Components: libprocess
>            Reporter: Anand Mazumdar
>            Assignee: Ilya Pronin
>              Labels: mesosphere
>
> We currently don't use TCP KeepAlive's when creating sockets in libprocess. 
> This might benefit master - scheduler, master - agent connections i.e. we can 
> detect if any of them failed faster.
> Currently, if the master process goes down. If for some reason the {{RST}} 
> sequence did not reach the scheduler, the scheduler can only come to know 
> about the disconnection when it tries to do a {{send}} itself. 
> The default TCP keep alive values on Linux are of little use in a real world 
> application:
> {code}
> . This means that the keepalive routines wait for two hours (7200 secs) 
> before sending the first keepalive probe, and then resend it every 75 
> seconds. If no ACK response is received for nine consecutive times, the 
> connection is marked as broken.
> {code}
> However, for long running instances of scheduler/agent this still can be 
> beneficial. Also, operators might start tuning the values for their clusters 
> explicitly once we start supporting it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (MESOS-5361) Consider introducing TCP KeepAlive for Libprocess sockets.

Reply via email to