[
https://issues.apache.org/jira/browse/CASSANDRA-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703209#comment-13703209
]
Jason Brown edited comment on CASSANDRA-5669 at 7/9/13 12:28 PM:
-----------------------------------------------------------------
I spent a lot of time thinking about this :), and I think the situation in this
ticket is subtly different from what happened in CASSANDRA-5171/CASSANDRA-5432.
I commented on that ticket as to why I think it had a problem (short answer:
connecting to publicIP on non-SSL port). This ticket does not get us into that
situation as we will continue to connect to the publicIP/(SSL) port - we simply
bypass reconnecting on the local port if we see the other node has a lower
messaging version.
I did test out this upgrade scenario a few weeks ago when we concocted it (and
it worked), and will be happy to try it out again. It'll take a few hours
(including time for dropping kids of at camp), so I'll update this ticket later
in the morning.
was (Author: jasobrown):
I spent a lot of time thinking about this :), and I think the situation in
this ticket is subtly different from what happened in CASSANDRA-5171. I
commented on that ticket as to why I think it had a problem (short answer:
connecting to publicIP on non-SSL port). This ticket does not get us into that
situation as we will continue to connect to the publicIP/(SSL) port - we simply
bypass reconnecting on the local port if we see the other node has a lower
messaging version.
I did test out this upgrade scenario a few weeks ago when we concocted it (and
it worked), and will be happy to try it out again. It'll take a few hours
(including time for dropping kids of at camp), so I'll update this ticket later
in the morning.
> Connection thrashing in multi-region ec2 during upgrade, due to messaging
> version
> ---------------------------------------------------------------------------------
>
> Key: CASSANDRA-5669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5669
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.2.5
> Reporter: Jason Brown
> Assignee: Jason Brown
> Priority: Minor
> Labels: ec2, ec2multiregionsnitch, gossip
> Fix For: 1.2.6, 2.0 beta 1
>
> Attachments: 5669-v1.diff, 5669-v2.diff
>
>
> While debugging the upgrading scenario described in CASSANDRA-5660, I
> discovered the ITC.close() will reset the message protocol version of a peer
> node that disconnects. CASSANDRA-5660 has a full description of the upgrade
> path, but basically the Ec2MultiRegionSnitch will close connections on the
> publicIP addr to reconnect on the privateIp, and this causes ITC to drop the
> message protocol version of previously known nodes. I think we want to hang
> onto that version so that when the newer node (re-)connects to the lower node
> version, it passes the correct protocol version rather than the current
> version (too high for the older node),the connection attempt getting dropped,
> and going through the dance again.
> To clarify, the 'thrashing' is at a rather low volume, from what I observed.
> Anecdotaly, perhaps one connection per second gets turned over.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira