[
https://issues.apache.org/jira/browse/CASSANDRA-14389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444266#comment-16444266
]
Dinesh Joshi commented on CASSANDRA-14389:
------------------------------------------
I found the issue. When you leave the local side of the socket unbound, the
kernel will prefer the IP address that matches the remote IP. Say node1 with IP
{{127.0.0.1}} wants to open a connection to node2 with IP {{127.0.0.2}}, the
socket would look like {{<127.0.0.2:61002, 127.0.0.2:7000>}} on node1. This
seems to confuse the streaming code. Here's how -
Say we have three nodes node1, node2 & node3 with IPs {{127.0.0.1, 127.0.0.2,
127.0.0.3}}. node1 has data and node3 is bootstrapping. It requests a stream
from node1. So node3 is the `peer` in this case and node1's code execution is
described below -
* node1 receives the request ({{StreamingInboundHandler#deriveSession}}) and
{{StreamResultFuture#initReceivingSide}} creates a new {{StreamResultFuture}}
and calls {{attachConnection()}}. At this point it has two sets of IP & Ports
from the peer. They are identified by the variable `{{from}}` & expression
`{{channel.remoteAddress()}}` a.k.a `{{connecting}}` ).
* {{StreamResultFuture#attachConnection calls
StreamCoordinator#getOrCreateSessionById}} passing the from IP &
{{InetAddressAndPort.getByAddressOverrideDefaults(connecting, from.port)}} (!!!)
* The key observation here is `from` is the IP that the peer sent in the
`{{StreamMessageHeader}}` while `connecting` is the remote IP of the peer.
* {{StreamCoordinator#getOrCreateSessionById}} subsequently calls
{{StreamCoordinator#getOrCreateHostData(peer)}}. So we're indexing the
{{peerSessions}} by the `{{peer}}` IP address. We also end up creating a
`{{StreamSession}}` in the process.
* During `{{StreamSession}}` creation, we end up passing the `{{peer}}` and
`{{connecting}}` IPs. We use the `connecting` IP to establish the outbound
connection to the peer. ({{NettyStreamingMessageSender}} is now connected to
`{{connecting}}` IP on port {{7000}}).
In our case, since we leave the local side of the socket unbound, although the
`{{peer}}` correctly sets its IP to {{127.0.0.3}} in the
{{StreamMessageHeader}}, the {{localAddress}} that the kernel chooses for it is
{{127.0.0.1}}. On the inbound node1 seems to think that the `peer` is
{{127.0.0.3}} however the connecting IP address should be {{127.0.0.1}}.
Therefore, it prefers that IP when trying to establish an outbound session. In
fact it establishes a connection to itself leading to the `{{Unknown peer
requested: 127.0.0.1:7000}}` exception. Note that along the way it actually
drops the ephemeral port and instead uses the port returned by
{{MessagingService#portFor}}.
Streaming code seems to rely on the perceived remote IP address of the host
rather than the one that is set in the message header. I am not sure if
preferring the IP address set in the header is the correct approach.
> Resolve local address binding in 4.0
> ------------------------------------
>
> Key: CASSANDRA-14389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14389
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jason Brown
> Assignee: Jason Brown
> Priority: Minor
> Fix For: 4.x
>
>
> CASSANDRA-8457/CASSANDRA-12229 introduced a regression against
> CASSANDRA-12673. This was discovered with CASSANDRA-14362 and moved here for
> resolution independent of that ticket.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]