[
https://issues.apache.org/jira/browse/CASSANDRA-11093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209002#comment-15209002
]
varun commented on CASSANDRA-11093:
-----------------------------------
Hi, Yes, I was able to test and restart now seems to work fine. Thanks
> processs restarts are failing becase native port and jmx ports are in use
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-11093
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11093
> Project: Cassandra
> Issue Type: Bug
> Components: Configuration
> Environment: PROD
> Reporter: varun
> Priority: Minor
> Labels: lhf
>
> A process restart should automatically take care of this. But it is not and
> it is a problem.
> The ports are are considered in use even if the process has quit/died/killed
> but the socket is in a TIME_WAIT state in the TCP FSM
> (http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm).
> tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 192.168.1.2:9160 0.0.0.0:* LISTEN 30099/java
> tcp 0 0 10.130.128.131:58263 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 10.130.128.131:58262 10.130.128.131:9042 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:9042 :::* LISTEN 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57191 ESTABLISHED
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57190 ESTABLISHED
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37105 ESTABLISHED
> 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42562 ::ffff:127.0.0.1:7199 TIME_WAIT -
> tcp 0 0 ::ffff:10.130.128.131:57190 ::ffff:10.130.128.131:9042 ESTABLISHED
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57198 ::ffff:10.130.128.131:9042 ESTABLISHED
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37106 ESTABLISHED
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:57197 ::ffff:10.130.128.131:9042 ESTABLISHED
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:57191 ::ffff:10.130.128.131:9042 ESTABLISHED
> 30138/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57198 ESTABLISHED
> 30099/java
> tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57197 ESTABLISHED
> 30099/java
> tcp 0 0 ::ffff:127.0.0.1:42567 ::ffff:127.0.0.1:7199 TIME_WAIT -
> I had to write a restart handler that does a netstat call and looks to make
> sure all the TIME_WAIT states exhaust before starting the node back up. This
> happened on 26 of the 56 when a rolling restart was performed. The issue was
> mostly around JMX port 7199. There was another rollling restart done on the
> 26 nodes to remediate the JMX ports issue in that restart one node had the
> issue where port 9042 was considered used after the restart and the process
> died after a bit of time.
> What needs to be done for port the native port 9042 and JMX port 7199 is to
> create the underlying TCP socket with SO_REUSEADDR. This eases the
> restriction and allows the port to be bound by process even if there are
> sockets open to that port in the TCP FSM, as long as there is no other
> process listening on that port. There is a Java method available to set this
> option in java.net.Socket
> https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%29.
> native port 9042:
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> JMX port 7199:
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40
> Looking in the code itself this option is being set on thrift (9160
> (default)) and internode communication ports, uncrypted (7000 (default)) and
> SSL encrypted (7001 (default)) .
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> This needs to be set to native and jmx ports as well.
> References:
> https://unix.stackexchange.com/questions/258379/when-is-a-port-considered-being-used/258380?noredirect=1
> https://stackoverflow.com/questions/23531558/allow-restarting-java-application-with-jmx-monitoring-enabled-immediately
> https://docs.oracle.com/javase/8/docs/technotes/guides/rmi/socketfactory/
> https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress
> https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%293
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38
> https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)