[ 
https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829276#comment-16829276
 ] 

Per Otterström commented on CASSANDRA-14848:
--------------------------------------------

Tested this in a local ccm cluster. Without the patch I'm able to reproduce the 
reported issue, and I can also verify that the patch solves the issue.

However, I get this error when I set ```enable_legacy_ssl_storage_port: 
false``` _on the last node_ which I didn't expect:
{code:java}
ERROR [main] 2019-04-29 11:39:12,995 CassandraDaemon.java:743 - Exception 
encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1546)
        at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:553)
        at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:841)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:650)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:379)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:609)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:721)
{code}

Procedure to reproduce:
{code}
# Create and start cluster
ccm create -v 3.11.4 -n 4 --node-ssl=~/cert c14848
ccm start

# Upgrade node1
ccm node1 stop
ccm node1 setdir
sed -i 's/seeds.*/seeds: 127.0.0.1,127.0.0.2/' 
~/.ccm/c14848/node1/conf/cassandra.yaml
sed -i 's/enable_legacy_ssl_storage_port: false/enable_legacy_ssl_storage_port: 
true/' ~/.ccm/c14848/node1/conf/cassandra.yaml
sed -i '/enable_legacy_ssl_storage_port/{n;s/false/true/}' 
~/.ccm/c14848/node1/conf/cassandra.yaml
ccm ${1} start

# Repeat steps above on node2 and node3 as well

# Repeat steps on node4 but let enable_legacy_ssl_storage_port be false.
{code}

Note that if I perform the upgrade with legacy ssl storage port enabled also on 
the final node, the upgrade will be able to complete without errors. Once that 
is completed it is possible to do a rolling restart while disabling the legacy 
port.

> When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non 
> seed nodes
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14848
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14848
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Internode
>            Reporter: Tommy Stendahl
>            Assignee: Tommy Stendahl
>            Priority: Urgent
>              Labels: security
>             Fix For: 4.0
>
>
> When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 
> node only connects to 3.11.3 seed node, there are no connection established 
> to non-seed nodes on the old version.
> I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 
> non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this 
> nodetool status on the different nodes:
> {noformat}
> *.242
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 
> RAC1
> DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 
> RAC1
> DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 
> RAC1
> UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 
> RAC1
> *.243 and *.244
> -- Address Load Tokens Owns (effective) Host ID Rack
> DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 
> RAC1
> UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1
> UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 
> RAC1
> UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 
> RAC1
> *.246
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 
> RAC1
> UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1
> UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 
> RAC1
> UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 
> RAC1
> {noformat}
>  
> I have built 4.0 with wire tracing activated and in my config the 
> storage_port=12700 and ssl_storage_port=12701. In the log I can see that the 
> 4.0 node start to connect to the 3.11.3 seed node on the storage_port but 
> quickly switch to the ssl_storage_port, but when connecting to the non-seed 
> nodes it never switch to the ssl_storage_port.
> {noformat}
> >grep 193.246 system.log | grep Outbound
> 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: 
> /10.216.193.246:12700
> 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: 
> /10.216.193.246:12701
> 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, 
> L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE
> 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, 
> L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B
> >grep 193.243 system.log | grep Outbound
> 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: 
> /10.216.193.243:12700
> 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: 
> /10.216.193.243:12700
> 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: 
> /10.216.193.243:12700
> 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: 
> /10.216.193.243:12700{noformat}
>  
> When I had the dbug log activated and started the 4.0 node I can see that it 
> switch port for *.246 but not for *.243 and *.244.
> {noformat}
> >grep DEBUG system.log| grep OutboundMessagingConnection | grep 
> >maybeUpdateConnectionId
> 2018-10-25T13:12:56.095+0200 [ScheduledFastTasks:1] DEBUG 
> o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing 
> connectionId to 10.216.193.246:12701 (GOSSIP), with a different port for 
> secure communication, because peer version is 11
> 2018-10-25T13:12:58.100+0200 [ReadStage-1] DEBUG 
> o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing 
> connectionId to 10.216.193.246:12701 (SMALL_MESSAGE), with a different port 
> for secure communication, because peer version is 11
> 2018-10-25T13:13:05.764+0200 [main] DEBUG 
> o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing 
> connectionId to 10.216.193.246:12701 (LARGE_MESSAGE), with a different port 
> for secure communication, because peer version is 11
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to