[
https://issues.apache.org/jira/browse/AMBARI-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aravindan Vijayan updated AMBARI-14580:
---------------------------------------
Summary: AMS collector clients likely to create self simultaneous open tcp
sockets (was: ams collector clients likely to create self simultaneous open
tcp sockets)
> AMS collector clients likely to create self simultaneous open tcp sockets
> -------------------------------------------------------------------------
>
> Key: AMBARI-14580
> URL: https://issues.apache.org/jira/browse/AMBARI-14580
> Project: Ambari
> Issue Type: Bug
> Components: ambari-metrics
> Affects Versions: 2.1.0
> Environment: IBM BigInsights 4.1
> Reporter: David Miller
>
> Multiple clients connect to the ambari metrics timeline metrics service.
> timeline.metrics.service.webapp.address in the Advanced ams-site
> configuration section specifies the collector port by default as 6188.
> Many of these clients are on the same host as the collector which can lead to
> them creating a self simultaneous open TCP connection if the ambari metrics
> collector is not listening on this port (such as when it is stopped). See
> http://stackoverflow.com/questions/5139808/tcp-simultaneous-open-and-self-connect-prevention
> for a discussion of this condition.
> Once this condition is triggered, the ams collector cannot start because the
> port is now held by the client which tried to connect to it.
> Any client which connects to itself expecting to connect to the ams collector
> appears to hold this connection forever.
> We have seen this condition happen twice by accident and we can reproduce.
> While this condition is possible for any connection with the same remote and
> local address it appears that it is especially likely to happen with
> connections to the ams collector, probably due to the usual scenario of
> having the collector on the same machine as many other services which try to
> connect to it.
> To reproduce the problem:
> 1.Stop the ambari metrics collector
> 2.wait an unspecified amount of time (hours or days) and check netstat for
> self simultaneous open connections having the same local and remote host:port
> tuple like the below:
> a. tcp 0 0 10.93.132.110:6188 10.93.132.110:6188
> ESTABLISHED –
> 3. attempt to start the ambari metrics collector, it will fail with an error
> line:
> Caused by: java.net.BindException: Port in use: 0.0.0.0:6188
> Possible Solutions:
> *Change collector clients to time out connections when no response or
> unexpected responses are received (connected to self scenario)
> *Enable SO_REUSEADDR to possibly decrease chances of selecting the same local
> port as remote port
> *Recommend that users reconfigure their OS's ephermal port range to not
> include the collector listener port
> *Increase reconnect wait time when connecting to the connector
> *Others?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)