[
https://issues.apache.org/jira/browse/CASSANDRA-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883419#comment-13883419
]
Minh Do commented on CASSANDRA-6619:
------------------------------------
As posted in other tickets, 1.1 and 1.2 have different message protocols.
Hence, it is important to set the right target version when making outbound
connections rather than depending on the inbound connections to set a version
value. Thus, race condition in setting the version values is solved.
Attachment is the patch to make sure the code does that when an outbound
connection is open and an exchange for versioning information in the hankshake
fails.
As discussed with Jason Brown here at Netflix, we came up with a solution that
during the upgrade, the upgraded nodes have in the environment the variable
cassandra.prev_version = 5 (for 1.1.7 or 4 for 1.1) to help out the handshakes
in a mixed version cluster.
Once a cluster is fully upgraded to 1.2, cassadra.prev_version is removed from
all nodes' environment and a C* rolling restart across nodes is required. This
step ensures that the new patch won't penalize the 1.2 cluster where all
outbound connections are from 1.2 to 1.2.
> Race condition issue during upgrading 1.1 to 1.2
> ------------------------------------------------
>
> Key: CASSANDRA-6619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6619
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Minh Do
> Assignee: Minh Do
> Priority: Minor
> Fix For: 1.2.14
>
>
> There is a race condition during upgrading a C* 1.1x cluster to C* 1.2.
> One issue is that OutboundTCPConnection can't establish from a 1.2 node to
> some 1.1x nodes. Because of this, a live cluster during the upgrading will
> suffer in high read latency and be unable to fulfill some write requests. It
> won't be a problem if there is a small cluster but it is a problem in a large
> cluster (100+ nodes) because the upgrading process takes 10+ hours to 1+
> day(s) to complete.
> Acknowledging about CASSANDRA-5692, however, it is not fully fixed. We
> already have a patch for this and will attach shortly for feedback.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)