[
https://issues.apache.org/jira/browse/CASSANDRA-18075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640802#comment-17640802
]
Alaykumar Barochia commented on CASSANDRA-18075:
------------------------------------------------
[~brandon.williams] - Whenever we do not define {{storage_port}} and
{{ssl_storage_port}} explicitly, it takes the default value, which is {{7000}}
and {{7001}} respectively. In my setup also, on both 3.11.4 and 4.0.4 versions,
it is taking these values only.
Below is the snippet from system.log file from both version which clearly shows
{{7000}} and {{7001}} port is being used for {{storage_port}} and
{{ssl_storage_port}} respectively.
*4.0.4:*
{noformat}
47; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=0.0.0.0;
rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true;
saved_caches_directory=/data/saved_caches;
seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=10.109.45.8,10.109.6.153,10.110.44.207,10.110.44.220};
server_encryption_options=<REDACTED>; slow_query_log_timeout_in_ms=500;
snapshot_before_compaction=false; snapshot_links_per_second=0;
snapshot_on_duplicate_row_detection=false;
snapshot_on_repaired_data_mismatch=false; ssl_storage_port=7001;
sstable_preemptive_open_interval_in_mb=50; start_native_transport=true;
storage_port=7000; stream_entire_sstables=true;
stream_throughput_outbound_megabits_per_sec=200;
streaming_connections_per_host=1; streaming_keep_alive_period_in_secs=300;
table_count_warn_threshold=150; tombstone_failure_threshold=100000;
tombstone_warn_threshold=1000; tracetype_query_ttl=86400;
tracetype_repair_ttl=604800;
transparent_data_encryption_options=org.apache.cassandra.conf
{noformat}
*3.11.4*
{noformat}
es=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync;
saved_caches_directory=/data/saved_caches;
seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=10.109.45.8,10.109.6.153,10.110.4.110,10.110.44.220};
server_encryption_options=<REDACTED>; slow_query_log_timeout_in_ms=500;
snapshot_before_compaction=false; ssl_storage_port=7001;
sstable_preemptive_open_interval_in_mb=50; start_native_transport=true;
start_rpc=false; storage_port=7000;
stream_throughput_outbound_megabits_per_sec=200;
streaming_keep_alive_period_in_secs=300;
streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15
{noformat}
> Upgraded (C* 4.0.4) node stops communicating with older version (3.11.4)
> nodes during upgrade
> ---------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18075
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18075
> Project: Cassandra
> Issue Type: Bug
> Components: Feature/Encryption
> Reporter: Alaykumar Barochia
> Priority: Normal
> Attachments: cassandra-env.sh_3114, cassandra-env.sh_404,
> cassandra.yaml_3114, cassandra.yaml_404, system.log_10.110.44.207
>
>
> We are testing upgrade from Cassandra 3.11.4 to 4.0.4 on our test cluster
> which is SSL enabled and facing an issue.
> Our cluster size is 3x3.
> {noformat}
> Datacenter: abssl_dev_tap_ttc
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 10.109.6.153 94.27 KiB 16 100.0%
> 130e59d2-2a9a-4039-a42f-deb20afcf288 rack1
> UN 10.109.45.8 104.43 KiB 16 100.0%
> 35274a2c-f915-4308-9981-d207a4e2108f rack1
> UN 10.109.66.149 104.23 KiB 16 100.0%
> ea0151bc-fb6c-425d-af42-75c10e52f941 rack1
> Datacenter: abssl_dev_tap_tte
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 10.110.4.110 104.44 KiB 16 100.0%
> fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554 rack1
> UN 10.110.44.220 99.33 KiB 16 100.0%
> f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947 rack1
> UN 10.110.49.242 65.57 KiB 16 100.0%
> 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd rack1
> dbaasprod-ca-abssl-de-393671-v001-yqlvf:~# nodetool describecluster
> Cluster Information:
> Name: abssl_dev
> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
> DynamicEndPointSnitch: enabled
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> f68fbc0c-c9d6-3709-8075-c5a0d74192f2: [10.110.4.110,
> 10.110.44.220, 10.109.6.153, 10.109.45.8, 10.109.66.149, 10.110.49.242]
> {noformat}
> During the upgrade, we re-run the pipeline in which we get new server (with
> different IP) that will have Cassandra 4.0.4 binary.
> Disk '/data' (contains data files, commitlogs etc.) will get detached from
> the old server and get attached to the new server.
> This process works fine on non-SSL cluster but when we perform this on SSL
> cluster, new node stops communicating with the rest of the nodes.
> In this example, after upgrade, node 10.110.4.110 got replaced with new
> server with new IP 10.110.44.207.
> *Output from 3.11.4 node:*
> {noformat}
> dbaasprod-ca-abssl-dc-437097-v001-7mump:~# hostname -i
> 10.109.6.153
> dbaasprod-ca-abssl-dc-437097-v001-7mump:~# java -version
> openjdk version "1.8.0_322"
> OpenJDK Runtime Environment (Temurin)(build 1.8.0_322-b06)
> OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)
> dbaasprod-ca-abssl-dc-437097-v001-7mump:~# nodetool status
> Datacenter: abssl_dev_tap_ttc
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 10.109.6.153 135.24 KiB 16 100.0%
> 130e59d2-2a9a-4039-a42f-deb20afcf288 rack1
> UN 10.109.45.8 135.35 KiB 16 100.0%
> 35274a2c-f915-4308-9981-d207a4e2108f rack1
> UN 10.109.66.149 135.25 KiB 16 100.0%
> ea0151bc-fb6c-425d-af42-75c10e52f941 rack1
> Datacenter: abssl_dev_tap_tte
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> DN 10.110.4.110 104.44 KiB 16 100.0%
> fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554 rack1
> UN 10.110.44.220 104.44 KiB 16 100.0%
> f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947 rack1
> UN 10.110.49.242 65.57 KiB 16 100.0%
> 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd rack1
> dbaasprod-ca-abssl-dc-437097-v001-7mump:~# nodetool describecluster
> Cluster Information:
> Name: abssl_dev
> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
> DynamicEndPointSnitch: enabled
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> f68fbc0c-c9d6-3709-8075-c5a0d74192f2: [10.110.44.220,
> 10.109.6.153, 10.109.45.8, 10.109.66.149, 10.110.49.242]
> UNREACHABLE: [10.110.4.110]
> {noformat}
> *Output from 4.0.4 node:*
> {noformat}
> dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# hostname -i
> 10.110.44.207
> dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# java -version
> openjdk version "11.0.15" 2022-04-19
> OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10)
> OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode)
> dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> DN 10.109.6.153 ? 16 0.0%
> 130e59d2-2a9a-4039-a42f-deb20afcf288 r1
> DN 10.109.45.8 ? 16 0.0%
> 35274a2c-f915-4308-9981-d207a4e2108f r1
> DN 10.109.66.149 ? 16 0.0%
> ea0151bc-fb6c-425d-af42-75c10e52f941 r1
> DN 10.110.44.220 ? 16 0.0%
> f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947 r1
> DN 10.110.49.242 ? 16 0.0%
> 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd r1
> Datacenter: abssl_dev_tap_tte
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 10.110.44.207 146.27 KiB 16 100.0%
> fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554 rack1
> dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# nodetool describecluster
> Cluster Information:
> Name: abssl_dev
> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
> DynamicEndPointSnitch: disabled
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> 1ccaeb62-5816-3599-897f-de59fd56eef2: [10.110.44.207]
> UNREACHABLE: [10.109.45.8, 10.109.66.149, 10.110.44.220,
> 10.109.6.153, 10.110.49.242]
> Stats for all nodes:
> Live: 1
> Joining: 0
> Moving: 0
> Leaving: 0
> Unreachable: 5
> Data Centers:
> DC1 #Nodes: 5 #Down: 0
> abssl_dev_tap_tte #Nodes: 1 #Down: 0
> Database versions:
> : [10.109.45.8:7000, 10.109.66.149:7000, 10.110.44.220:7000,
> 10.109.6.153:7000, 10.110.49.242:7000]
> 4.0.4: [10.110.44.207:7000]
> Keyspaces:
> system_schema -> Replication class: LocalStrategy {}
> system -> Replication class: LocalStrategy {}
> system_auth -> Replication class: NetworkTopologyStrategy
> {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3}
> system_distributed -> Replication class: NetworkTopologyStrategy
> {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3}
> system_traces -> Replication class: NetworkTopologyStrategy
> {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3}
> {noformat}
> Getting below error in system.log file of new node 10.110.44.207 which has
> Cassandra version 4.0.4.
> {noformat}
> WARN [Messaging-EventLoop-3-6] 2022-11-28 06:20:49,577 NoSpamLogger.java:95
> - /10.110.44.207:7000->/10.109.45.8:7000-URGENT_MESSAGES-[no-channel]
> dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before
> reaching the network
> INFO [Messaging-EventLoop-3-6] 2022-11-28 06:21:17,921 NoSpamLogger.java:92
> - /10.110.44.207:7000->/10.110.49.242:7000-URGENT_MESSAGES-[no-channel]
> failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..)
> failed: Connection refused: /10.110.49.242:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673)
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650)
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530)
> at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> I am attaching the cassandra.yaml, cassandra-env.sh files from both versions
> (3.11.4 and 4.0.4).
> Also attaching the system.log file from upgraded node 10.110.44.207.
> It seems like some bug and hence raising this Jira. Can you please have a
> look?
> Let me know if you need any more details.
> Thanks,
> Alaykumar Barochia
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]