[ 
https://issues.apache.org/jira/browse/CASSANDRA-11093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189513#comment-15189513
 ] 

Sam Tunnicliffe commented on CASSANDRA-11093:
---------------------------------------------

cc [~norman] just FYI, the bug in the native epoll transport I mentioned above 
can be seen 
[here|https://github.com/netty/netty/blob/netty-4.0.34.Final/transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.h#L61]
 (note the additional trailing 's' in the method name). This was fixed by 
[7e057de|https://github.com/netty/netty/commit/7e057de98bdea0f6b268a63a6b1aba2483ede4da]
 for [#4800|https://github.com/netty/netty/issues/4800]

> processs restarts are failing becase native port and jmx ports are in use
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11093
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11093
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Configuration
>         Environment: PROD
>            Reporter: varun
>            Priority: Minor
>              Labels: lhf
>
> A process restart should automatically take care of this. But it is not and 
> it is a problem.
> The ports are are considered in use even if the process has quit/died/killed 
> but the socket is in a TIME_WAIT state in the TCP FSM 
> (http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm).
> tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 192.168.1.2:9160 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 10.130.128.131:58263 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 10.130.128.131:58262 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:9042 :::* LISTEN 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57191 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57190 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37105 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42562 ::ffff:127.0.0.1:7199 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:57190 ::ffff:10.130.128.131:9042 ESTABLISHED 
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57198 ::ffff:10.130.128.131:9042 ESTABLISHED 
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37106 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:57197 ::ffff:10.130.128.131:9042 ESTABLISHED 
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57191 ::ffff:10.130.128.131:9042 ESTABLISHED 
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57198 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57197 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42567 ::ffff:127.0.0.1:7199 TIME_WAIT -
> I had to write a restart handler that does a netstat call and looks to make 
> sure all the TIME_WAIT states exhaust before starting the node back up. This 
> happened on 26 of the 56 when a rolling restart was performed. The issue was 
> mostly around JMX port 7199. There was another rollling restart done on the 
> 26 nodes to remediate the JMX ports issue in that restart one node had the 
> issue where port 9042 was considered used after the restart and the process 
> died after a bit of time.
> What needs to be done for port the native port 9042 and JMX port 7199 is to 
> create the underlying TCP socket with SO_REUSEADDR. This eases the 
> restriction and allows the port to be bound by process even if there are 
> sockets open to that port in the TCP FSM, as long as there is no other 
> process listening on that port. There is a Java method available to set this 
> option in java.net.Socket 
> https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%29.
> native port 9042: 
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> JMX port 7199: 
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40
> Looking in the code itself this option is being set on thrift (9160 
> (default)) and internode communication ports, uncrypted (7000 (default)) and 
> SSL encrypted (7001 (default)) .
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> This needs to be set to native and jmx ports as well.
> References:
> https://unix.stackexchange.com/questions/258379/when-is-a-port-considered-being-used/258380?noredirect=1
> https://stackoverflow.com/questions/23531558/allow-restarting-java-application-with-jmx-monitoring-enabled-immediately
> https://docs.oracle.com/javase/8/docs/technotes/guides/rmi/socketfactory/
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%293
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to