[ 
https://issues.apache.org/jira/browse/AMBARI-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated AMBARI-14580:
---------------------------------------
    Summary: AMS collector clients likely to create self simultaneous open tcp 
sockets  (was: ams collector clients likely to create self simultaneous open 
tcp sockets)

> AMS collector clients likely to create self simultaneous open tcp sockets
> -------------------------------------------------------------------------
>
>                 Key: AMBARI-14580
>                 URL: https://issues.apache.org/jira/browse/AMBARI-14580
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-metrics
>    Affects Versions: 2.1.0
>         Environment: IBM BigInsights 4.1
>            Reporter: David Miller
>
> Multiple clients connect to the ambari metrics timeline metrics service.
> timeline.metrics.service.webapp.address in the Advanced ams-site 
> configuration section specifies the collector port by default as 6188.
> Many of these clients are on the same host as the collector which can lead to 
> them creating a self simultaneous open TCP connection if the ambari metrics 
> collector is not listening on this port (such as when it is stopped). See 
> http://stackoverflow.com/questions/5139808/tcp-simultaneous-open-and-self-connect-prevention
>  for a discussion of this condition.
> Once this condition is triggered,  the ams collector cannot start because the 
> port is now held by the client which tried to connect to it.
> Any client which connects to itself expecting to connect to the ams collector 
> appears to hold this connection forever.
> We have seen this condition happen twice by accident and we can reproduce.  
> While this condition is possible for any connection with the same remote and 
> local address it appears that it is especially likely to happen with 
> connections to the ams collector, probably due to the usual scenario of 
> having the collector on the same machine as many other services which try to 
> connect to it.
> To reproduce the problem:
> 1.Stop the ambari metrics collector
> 2.wait an unspecified amount of time (hours or days) and check netstat for 
> self simultaneous open connections having the same local and remote host:port 
> tuple like the below:
> a.    tcp        0      0 10.93.132.110:6188          10.93.132.110:6188      
>     ESTABLISHED –
> 3. attempt to start the ambari metrics collector, it will fail with an error 
> line:
> Caused by: java.net.BindException: Port in use: 0.0.0.0:6188
> Possible Solutions:
> *Change collector clients to time out connections when no response or 
> unexpected responses are received (connected to self scenario)
> *Enable SO_REUSEADDR to possibly decrease chances of selecting the same local 
> port as remote port
> *Recommend that users reconfigure their OS's ephermal port range to not 
> include the collector listener port
> *Increase reconnect wait time when connecting to the connector
> *Others?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to