[ 
https://issues.apache.org/jira/browse/CASSANDRA-11093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189503#comment-15189503
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-11093 at 3/11/16 9:16 AM:
----------------------------------------------------------------------

AFAICT, the {{SO_REUSEADDR}} option only needs to be explicitly set (by C*) in 
{{RMIServerSocketFactoryImpl}}. Both java NIO and Netty's native epoll 
{{ServerSocket}} implementations set the reuse address flag to true by default: 
[jdk|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/Net.java#l389]
 & 
[netty|https://github.com/netty/netty/blob/netty-4.0.23.Final/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollServerSocketChannelConfig.java#L42].
 The native protocol server will always choose one of these options, so those 
sockets already have address reuse enabled.

FTR, there is also an undocumented bug in the version of Netty we currently use 
which causes a {{java.lang.UnsatisfiedLinkError}} when querying the 
{{SO_REUSEADDR}} when using the native epoll transport, so running the new 
{{ServerTest}} under linux fails. This bug is present in the latest released 
version of Netty 4.0, 4.0.34.Final, but has since been fixed on the 4.0 branch. 
 

As far as the RMI server goes, on my linux box running jdk 1.8.0_74, the 
ServerSocket created by the default factory already has reuse enabled, but this 
default value is documented as undefined, so explicitly setting it here seems 
reasonable. [~Gerrrr], what are the details of the environment where you're 
seeing this? Also, can you confirm that the change to 
{{RMIServerSocketFactoryImpl}} is sufficient to fix your problem?

EDIT: sorry, I realized I'd directed the question about reproducibility to 
[~Gerrr], when I meant to ping [~varun] as the reporter



was (Author: beobal):
AFAICT, the {{SO_REUSEADDR}} option only needs to be explicitly set (by C*) in 
{{RMIServerSocketFactoryImpl}}. Both java NIO and Netty's native epoll 
{{ServerSocket}} implementations set the reuse address flag to true by default: 
[jdk|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/Net.java#l389]
 & 
[netty|https://github.com/netty/netty/blob/netty-4.0.23.Final/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollServerSocketChannelConfig.java#L42].
 The native protocol server will always choose one of these options, so those 
sockets already have address reuse enabled.

FTR, there is also an undocumented bug in the version of Netty we currently use 
which causes a {{java.lang.UnsatisfiedLinkError}} when querying the 
{{SO_REUSEADDR}} when using the native epoll transport, so running the new 
{{ServerTest}} under linux fails. This bug is present in the latest released 
version of Netty 4.0, 4.0.34.Final, but has since been fixed on the 4.0 branch. 
 

As far as the RMI server goes, on my linux box running jdk 1.8.0_74, the 
ServerSocket created by the default factory already has reuse enabled, but this 
default value is documented as undefined, so explicitly setting it here seems 
reasonable. [~Gerrrr], what are the details of the environment where you're 
seeing this? Also, can you confirm that the change to 
{{RMIServerSocketFactoryImpl}} is sufficient to fix your problem?


> processs restarts are failing becase native port and jmx ports are in use
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11093
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11093
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Configuration
>         Environment: PROD
>            Reporter: varun
>            Priority: Minor
>              Labels: lhf
>
> A process restart should automatically take care of this. But it is not and 
> it is a problem.
> The ports are are considered in use even if the process has quit/died/killed 
> but the socket is in a TIME_WAIT state in the TCP FSM 
> (http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm).
> tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 192.168.1.2:9160 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 10.130.128.131:58263 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 10.130.128.131:58262 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:9042 :::* LISTEN 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57191 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57190 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37105 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42562 ::ffff:127.0.0.1:7199 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:57190 ::ffff:10.130.128.131:9042 ESTABLISHED 
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57198 ::ffff:10.130.128.131:9042 ESTABLISHED 
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37106 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:57197 ::ffff:10.130.128.131:9042 ESTABLISHED 
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57191 ::ffff:10.130.128.131:9042 ESTABLISHED 
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57198 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57197 ESTABLISHED 
> 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42567 ::ffff:127.0.0.1:7199 TIME_WAIT -
> I had to write a restart handler that does a netstat call and looks to make 
> sure all the TIME_WAIT states exhaust before starting the node back up. This 
> happened on 26 of the 56 when a rolling restart was performed. The issue was 
> mostly around JMX port 7199. There was another rollling restart done on the 
> 26 nodes to remediate the JMX ports issue in that restart one node had the 
> issue where port 9042 was considered used after the restart and the process 
> died after a bit of time.
> What needs to be done for port the native port 9042 and JMX port 7199 is to 
> create the underlying TCP socket with SO_REUSEADDR. This eases the 
> restriction and allows the port to be bound by process even if there are 
> sockets open to that port in the TCP FSM, as long as there is no other 
> process listening on that port. There is a Java method available to set this 
> option in java.net.Socket 
> https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%29.
> native port 9042: 
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> JMX port 7199: 
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40
> Looking in the code itself this option is being set on thrift (9160 
> (default)) and internode communication ports, uncrypted (7000 (default)) and 
> SSL encrypted (7001 (default)) .
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> This needs to be set to native and jmx ports as well.
> References:
> https://unix.stackexchange.com/questions/258379/when-is-a-port-considered-being-used/258380?noredirect=1
> https://stackoverflow.com/questions/23531558/allow-restarting-java-application-with-jmx-monitoring-enabled-immediately
> https://docs.oracle.com/javase/8/docs/technotes/guides/rmi/socketfactory/
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%293
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to