[ 
https://issues.apache.org/jira/browse/CASSANDRA-18075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722053#comment-17722053
 ] 

Alaykumar Barochia commented on CASSANDRA-18075:
------------------------------------------------

I already tried option: 1 by setting ssl_storage_port to 7001 on 4.0 node. It 
didn't help.
Also, we have already firewall port open for both ports 7000 and 7001 in TAP so 
option:2 is also ruled out.

Today, I tried the opposite, make 3.11 cluster to use 7000 for SSL/TLS and then 
tried upgrade to 4.0. Still the same issue.

*3.11.4 cluster: (Have set ssl_storage_port: 7000)*

{noformat}
Datacenter: c3ssl_dev_tap_ttc
=============================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID            
                   Rack
UN  10.109.44.76   128.32 KiB  16           63.1%             
325e24b3-81b9-4d19-abaf-1bb61f662be5  rack1
UN  10.109.30.228  153.03 KiB  16           71.1%             
4d1ff6ec-d781-474d-9862-4b31f1f583fe  rack1
UN  10.109.44.177  152.91 KiB  16           65.8%             
bb4000ce-8f87-4c8a-aefe-0bc26143c2d3  rack1
{noformat}
Upgraded node {{10.109.30.228}} first. New IP {{10.109.220.200}}.
*New node, stopped communicating with other nodes.*

*From node 10.109.44.76 :*

{noformat}
Datacenter: c3ssl_dev_tap_ttc
=============================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID            
                   Rack
UN  10.109.44.76   128.32 KiB  16           63.1%             
325e24b3-81b9-4d19-abaf-1bb61f662be5  rack1
DN  10.109.30.228  128.32 KiB  16           71.1%             
4d1ff6ec-d781-474d-9862-4b31f1f583fe  rack1
UN  10.109.44.177  128.22 KiB  16           65.8%             
bb4000ce-8f87-4c8a-aefe-0bc26143c2d3  rack1
{noformat}

*From node 10.109.220.200 :*

{noformat}
dbaasstg-ca-c3ssl-dc-834204-v002-1s7rs:/usr/lib/cassandra/logs# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load        Tokens  Owns (effective)  Host ID               
                Rack
DN  10.109.44.177   ?           16      65.8%             
bb4000ce-8f87-4c8a-aefe-0bc26143c2d3  r1
DN  10.109.44.76    ?           16      63.1%             
325e24b3-81b9-4d19-abaf-1bb61f662be5  r1

Datacenter: c3ssl_dev_tap_ttc
=============================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load        Tokens  Owns (effective)  Host ID               
                Rack
UN  10.109.220.200  212.45 KiB  16      71.1%             
4d1ff6ec-d781-474d-9862-4b31f1f583fe  rack1
{noformat}


> Upgraded (C* 4.0.4) node stops communicating with older version (3.11.4) 
> nodes during upgrade
> ---------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18075
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18075
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/Encryption
>            Reporter: Alaykumar Barochia
>            Priority: Normal
>         Attachments: In-place-upgrade.zip, cassandra-env.sh_3114, 
> cassandra-env.sh_404, cassandra.yaml_10.110.44.207_explicitely_set_port, 
> cassandra.yaml_10.110.49.242_explicitely_set_port, cassandra.yaml_3114, 
> cassandra.yaml_404, system.log_10.110.44.207, 
> system.log_10.110.44.207_after_explicitely_set_port, 
> system.log_10.110.49.242_after_explicitely_set_port
>
>
> We are testing upgrade from Cassandra 3.11.4 to 4.0.4 on our test cluster 
> which is SSL enabled and facing an issue.
> Our cluster size is 3x3. 
> {noformat}
> Datacenter: abssl_dev_tap_ttc
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns (effective)  Host ID          
>                      Rack
> UN  10.109.6.153   94.27 KiB  16           100.0%            
> 130e59d2-2a9a-4039-a42f-deb20afcf288  rack1
> UN  10.109.45.8    104.43 KiB  16           100.0%            
> 35274a2c-f915-4308-9981-d207a4e2108f  rack1
> UN  10.109.66.149  104.23 KiB  16           100.0%            
> ea0151bc-fb6c-425d-af42-75c10e52f941  rack1
> Datacenter: abssl_dev_tap_tte
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns (effective)  Host ID          
>                      Rack
> UN  10.110.4.110   104.44 KiB  16           100.0%            
> fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554  rack1
> UN  10.110.44.220  99.33 KiB  16           100.0%            
> f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947  rack1
> UN  10.110.49.242  65.57 KiB  16           100.0%            
> 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd  rack1
> dbaasprod-ca-abssl-de-393671-v001-yqlvf:~# nodetool describecluster
> Cluster Information:
>       Name: abssl_dev
>       Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>       DynamicEndPointSnitch: enabled
>       Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>       Schema versions:
>               f68fbc0c-c9d6-3709-8075-c5a0d74192f2: [10.110.4.110, 
> 10.110.44.220, 10.109.6.153, 10.109.45.8, 10.109.66.149, 10.110.49.242]
> {noformat}
> During the upgrade, we re-run the pipeline in which we get new server (with 
> different IP) that will have Cassandra 4.0.4 binary. 
> Disk '/data' (contains data files, commitlogs etc.) will get detached from 
> the old server and get attached to the new server.
> This process works fine on non-SSL cluster but when we perform this on SSL 
> cluster, new node stops communicating with the rest of the nodes.
> In this example, after upgrade, node 10.110.4.110 got replaced with new 
> server with new IP 10.110.44.207.
> *Output from 3.11.4 node:*
> {noformat}
> dbaasprod-ca-abssl-dc-437097-v001-7mump:~# hostname -i
> 10.109.6.153
> dbaasprod-ca-abssl-dc-437097-v001-7mump:~# java -version
> openjdk version "1.8.0_322"
> OpenJDK Runtime Environment (Temurin)(build 1.8.0_322-b06)
> OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)
> dbaasprod-ca-abssl-dc-437097-v001-7mump:~# nodetool status
> Datacenter: abssl_dev_tap_ttc
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns (effective)  Host ID          
>                      Rack
> UN  10.109.6.153   135.24 KiB  16           100.0%            
> 130e59d2-2a9a-4039-a42f-deb20afcf288  rack1
> UN  10.109.45.8    135.35 KiB  16           100.0%            
> 35274a2c-f915-4308-9981-d207a4e2108f  rack1
> UN  10.109.66.149  135.25 KiB  16           100.0%            
> ea0151bc-fb6c-425d-af42-75c10e52f941  rack1
> Datacenter: abssl_dev_tap_tte
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns (effective)  Host ID          
>                      Rack
> DN  10.110.4.110   104.44 KiB  16           100.0%            
> fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554  rack1
> UN  10.110.44.220  104.44 KiB  16           100.0%            
> f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947  rack1
> UN  10.110.49.242  65.57 KiB  16           100.0%            
> 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd  rack1
> dbaasprod-ca-abssl-dc-437097-v001-7mump:~# nodetool describecluster
> Cluster Information:
>       Name: abssl_dev
>       Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>       DynamicEndPointSnitch: enabled
>       Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>       Schema versions:
>               f68fbc0c-c9d6-3709-8075-c5a0d74192f2: [10.110.44.220, 
> 10.109.6.153, 10.109.45.8, 10.109.66.149, 10.110.49.242]
>               UNREACHABLE: [10.110.4.110]
> {noformat}
> *Output from 4.0.4 node:*
> {noformat}
> dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# hostname -i
> 10.110.44.207
> dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# java -version
> openjdk version "11.0.15" 2022-04-19
> OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10)
> OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode)
> dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load        Tokens  Owns (effective)  Host ID              
>                  Rack
> DN  10.109.6.153   ?           16      0.0%              
> 130e59d2-2a9a-4039-a42f-deb20afcf288  r1
> DN  10.109.45.8    ?           16      0.0%              
> 35274a2c-f915-4308-9981-d207a4e2108f  r1
> DN  10.109.66.149  ?           16      0.0%              
> ea0151bc-fb6c-425d-af42-75c10e52f941  r1
> DN  10.110.44.220  ?           16      0.0%              
> f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947  r1
> DN  10.110.49.242  ?           16      0.0%              
> 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd  r1
> Datacenter: abssl_dev_tap_tte
> =============================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load        Tokens  Owns (effective)  Host ID              
>                  Rack
> UN  10.110.44.207  146.27 KiB  16      100.0%            
> fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554  rack1
> dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# nodetool describecluster
> Cluster Information:
>       Name: abssl_dev
>       Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>       DynamicEndPointSnitch: disabled
>       Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>       Schema versions:
>               1ccaeb62-5816-3599-897f-de59fd56eef2: [10.110.44.207]
>               UNREACHABLE: [10.109.45.8, 10.109.66.149, 10.110.44.220, 
> 10.109.6.153, 10.110.49.242]
> Stats for all nodes:
>       Live: 1
>       Joining: 0
>       Moving: 0
>       Leaving: 0
>       Unreachable: 5
> Data Centers:
>       DC1 #Nodes: 5 #Down: 0
>       abssl_dev_tap_tte #Nodes: 1 #Down: 0
> Database versions:
>       : [10.109.45.8:7000, 10.109.66.149:7000, 10.110.44.220:7000, 
> 10.109.6.153:7000, 10.110.49.242:7000]
>       4.0.4: [10.110.44.207:7000]
> Keyspaces:
>       system_schema -> Replication class: LocalStrategy {}
>       system -> Replication class: LocalStrategy {}
>       system_auth -> Replication class: NetworkTopologyStrategy 
> {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3}
>       system_distributed -> Replication class: NetworkTopologyStrategy 
> {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3}
>       system_traces -> Replication class: NetworkTopologyStrategy 
> {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3}
> {noformat}
> Getting below error in system.log file of new node 10.110.44.207 which has 
> Cassandra version 4.0.4.
> {noformat}
> WARN  [Messaging-EventLoop-3-6] 2022-11-28 06:20:49,577 NoSpamLogger.java:95 
> - /10.110.44.207:7000->/10.109.45.8:7000-URGENT_MESSAGES-[no-channel] 
> dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before 
> reaching the network
> INFO  [Messaging-EventLoop-3-6] 2022-11-28 06:21:17,921 NoSpamLogger.java:92 
> - /10.110.44.207:7000->/10.110.49.242:7000-URGENT_MESSAGES-[no-channel] 
> failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /10.110.49.242:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
>       at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
>       at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
>       at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673)
>       at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650)
>       at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530)
>       at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
>       at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>       at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> I am attaching the cassandra.yaml, cassandra-env.sh files from both versions 
> (3.11.4 and 4.0.4).
> Also attaching the system.log file from upgraded node 10.110.44.207.
> It seems like some bug and hence raising this Jira. Can you please have a 
> look?
> Let me know if you need any more details.
> Thanks,
> Alaykumar Barochia



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to