[
https://issues.apache.org/jira/browse/CASSANDRA-18560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732449#comment-17732449
]
Maciej Sokol edited comment on CASSANDRA-18560 at 6/14/23 8:49 AM:
-------------------------------------------------------------------
Hey Brandon,
The issue is that streaming uses OutboundConnectionSettings::connectTo()
directly without checking if it's local DC or not. Also it completely ignores
prefer_local so the problem exists even with prefer_local=false.
Using OutboundConnectionSettings.withConnectTo(using connectTo):
[NettyStreamingMessageSender.java#L241|https://github.com/apache/cassandra/blob/5143bd81e82c35ce686dd40860ec2aebe30aaf22/src/java/org/apache/cassandra/streaming/async/NettyStreamingMessageSender.java#L241]
Using OutboundConnectionSettings.withDefaults(which uses connectTo):
[DefaultConnectionFactory.java#L49|https://github.com/apache/cassandra/blob/5143bd81e82c35ce686dd40860ec2aebe30aaf22/src/java/org/apache/cassandra/streaming/DefaultConnectionFactory.java#L49]
withDefaults:
[OutboundConnectionSettings.java#L481|https://github.com/apache/cassandra/blob/5143bd81e82c35ce686dd40860ec2aebe30aaf22/src/java/org/apache/cassandra/net/OutboundConnectionSettings.java#L481]
ConnectTo (the connectTo is always null in case of streaming):
[OutboundConnectionSettings.java#L451|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/net/OutboundConnectionSettings.java#L451]
was (Author: JIRAUSER285315):
Hey Brandon,
The issue is that streaming uses OutboundConnectionSettings::connectTo()
directly without checking if it's local DC or not. Also it completely ignores
prefer_local so the problem exists even with prefer_local=false.
Using OutboundConnectionSettings.withConnectTo(using connectTo):
[NettyStreamingMessageSender.java#L241|https://github.com/apache/cassandra/blob/5143bd81e82c35ce686dd40860ec2aebe30aaf22/src/java/org/apache/cassandra/streaming/async/NettyStreamingMessageSender.java#L241]
Using OutboundConnectionSettings.withDefaults(which uses connectTo):
[DefaultConnectionFactory.java#L49|https://github.com/apache/cassandra/blob/5143bd81e82c35ce686dd40860ec2aebe30aaf22/src/java/org/apache/cassandra/streaming/DefaultConnectionFactory.java#L49]
ConnectTo (the connectTo is always null in case of streaming):
[OutboundConnectionSettings.java#L451|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/net/OutboundConnectionSettings.java#L451]
> Incorrect IP used for gossip across DCs with prefer_local=true
> --------------------------------------------------------------
>
> Key: CASSANDRA-18560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18560
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: Brad Vernon
> Assignee: Brandon Williams
> Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>
> After installing a new node using 4.0.10 we experienced a situation where the
> new node attempted to connect to the private ip of a random number of nodes
> remote DCs which are only accessible via public ip for cross dc
> communications.
> The only impact was new nodes outbound connections, inbound from pre-4.0.10
> were not affected. system.peers_v2 (below) showed that the preferred_ip and
> preferred_port as null, only those in 4.0.10 nodes dc have perferred_ip
> values as expected.
> We believe the issue originated with
> https://issues.apache.org/jira/browse/CASSANDRA-16718
> Details on cluster:
> * All nodes have public IP configured as well as private IP
> * Listen/rpc addressrs are configured for private ip, broadcast is public IP
> * prefer_local=true is enabled for all nodes
> The log that showed the connection failing:
> {code:java}
> INFO [Messaging-EventLoop-3-8] 2023-06-01 00:14:21,565 NoSpamLogger.java:92
> -
> /99.81.<redacted>:7000->/44.208.<redacted>:7000-URGENT_MESSAGES-[no-channel]
> failed to connectio.netty.channel.ConnectTimeoutException: connection timed
> out: /10.26.5.11:7000 at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576){code}
> 99 and 44 instances can only access each other using public ips.
> gossipinfo output from 4.0.10 node
> {code:java}
> /44.208.<redacted>
> generation:1661113358
> heartbeat:25267691
> LOAD:25267683:1.7882044268E10
> SCHEMA:24692061:e98b918d-499f-3ccc-8dbe-5af31f685bda
> DC:13:us-east-1
> RACK:15:1a
> RELEASE_VERSION:6:4.0.5
> NET_VERSION:2:12
> HOST_ID:3:9a41e668-060d-4cfe-bb1e-013f5116422d
> RPC_READY:1407:true
> INTERNAL_ADDRESS_AND_PORT:9:10.26.5.11:7000
> NATIVE_ADDRESS_AND_PORT:4:44.208.<redacted>:9042
> STATUS_WITH_PORT:1393:NORMAL,-2262036356854762881
> SSTABLE_VERSIONS:7:big-nb
> TOKENS:1392:<hidden> {code}
> Peers output from 4.0.10 node:
> {code:java}
> peer | peer_port | data_center | host_id
> | native_address | native_port | preferred_ip | preferred_port |
> rack | release_version | schema_version |
> tokens----------------+-----------+---------------------+--------------------------------------+----------------+-------------+--------------+----------------+------+-----------------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 44.208.<redacted> | 7000 | us-east-1 |
> 9a41e668-060d-4cfe-bb1e-013f5116422d | 44.208.<redacted> | 9042 |
> null | null | 1a | 4.0.5 |
> e98b918d-499f-3ccc-8dbe-5af31f685bda | {'-2262036356854762881',
> '-4197710115038136897', '-7072386316096662315', '2085255826742630980',
> '249732489387853170', '4976300208126705818', '7187184456885833289',
> '8777189009399731927'} {code}
> To solve temporarily we routed outbound traffic to the private ip to public
> using iptables which resulted in successful outbound connections.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]