[jira] [Commented] (MESOS-5361) Consider introducing TCP KeepAlive for Libprocess sockets.

haosdent (JIRA) Sat, 14 May 2016 09:38:27 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283605#comment-15283605
 ]


haosdent commented on MESOS-5361:
---------------------------------

I see. XD

> Consider introducing TCP KeepAlive for Libprocess sockets.
> ----------------------------------------------------------
>
>                 Key: MESOS-5361
>                 URL: https://issues.apache.org/jira/browse/MESOS-5361
>             Project: Mesos
>          Issue Type: Improvement
>          Components: libprocess
>            Reporter: Anand Mazumdar
>              Labels: mesosphere
>
> We currently don't use TCP KeepAlive's when creating sockets in libprocess. 
> This might benefit master - scheduler, master - agent connections i.e. we can 
> detect if any of them failed faster.
> Currently, if the master process goes down. If for some reason the {{RST}} 
> sequence did not reach the scheduler, the scheduler can only come to know 
> about the disconnection when it tries to do a {{send}} itself. 
> The default TCP keep alive values on Linux are of little use in a real world 
> application:
> {code}
> . This means that the keepalive routines wait for two hours (7200 secs) 
> before sending the first keepalive probe, and then resend it every 75 
> seconds. If no ACK response is received for nine consecutive times, the 
> connection is marked as broken.
> {code}
> However, for long running instances of scheduler/agent this still can be 
> beneficial. Also, operators might start tuning the values for their clusters 
> explicitly once we start supporting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5361) Consider introducing TCP KeepAlive for Libprocess sockets.

Reply via email to