Aswin Karthik created CASSANDRA-18053:
-----------------------------------------
Summary: Node disconnection during cassandra 4.0 upgrade from
cassandra 3.11
Key: CASSANDRA-18053
URL: https://issues.apache.org/jira/browse/CASSANDRA-18053
Project: Cassandra
Issue Type: Bug
Reporter: Aswin Karthik
We are running Cassandra 3.11.11. We are upgrading to 4.0.5.
The nodes use 11044 for its storage port.
Our upgrade process is the usual
* Boot cassandra 4.0.5 using 3.11.11 data disk
* Run upgradesstables
However, during the upgrade, randomly a node is unable to connect to other
nodes in the cluster. This happens very intermittently and gets fixed on
restart.
On further diagnosis, we found that the problematic node uses 7000 from some
communication instead of the configured port
{noformat}
InboundConnectionInitiator.java:127 - Listening on address:
(node-1.dev/x.x.x.x:11044), nic: eth0, encryption: optionally encrypted(openssl)
OutboundConnection.java:1150 -
node-1.dev/x.x.x.x:7000(/x.x.x.x:50424)->/y.y.y.y:11044-URGENT_MESSAGES-3c193918
successfully connected, version = 12, framing = LZ4, encryption =
encryptedfactory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384){noformat}
Notice the x.x.x.x:7000 in log line even though x.x.x.x is starting on 11044.
This gets fixed on restart.
The logs on reboot
{noformat}
InboundConnectionInitiator.java:127 - Listening on address: (/x.x.x.x:11044),
nic: eth0, encryption: optionally encrypted(openssl)
InboundConnectionInitiator.java:464 -
/y.y.y.y:11044(/y.y.y.y:40656)->/x.x.x.x:11044-URGENT_MESSAGES-cade4755
messaging connection established, version = 12, framing = CRC, encryption =
encrypted(factory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384)
OutboundConnection.java:1150 -
/x.x.x.x:11044(/x.x.x.x:53316)->/y.y.y.y:11044-URGENT_MESSAGES-92d99f23
successfully connected, version = 12, framing = LZ4, encryption =
encrypted(factory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384)
{noformat}
Notice the Outbound connection log line has x.x.x.x:11044 this time.
This issue is very random.
Looks to be a bug. Is there a fix for this? Are we missing some steps during
the upgrade?
Some relevant sections of cassandra.yaml on both the cassandra 3.x and 4.x
{noformat}
storage_port: 11044
ssl_storage_port: 11044
server_encryption_options:
internode_encryption: all
keystore: ---------
keystore_password: -------
truststore: ---------
truststore_password: ---------
protocol: TLSv1.2
algorithm: PKIX
store_type: PKCS12
cipher_suites:
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
require_client_auth: true {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]