[ 
https://issues.apache.org/jira/browse/CASSANDRA-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794960#comment-13794960
 ] 

Sergio Bossa commented on CASSANDRA-5692:
-----------------------------------------

[~jjordan], do we have thread dumps from the timeout failures (prior the 
timeout)? If that didn't involve the connect method, we're probably seeing a 
different race.

Anyways, I'll have a look.

> Race condition in detecting version on a mixed 1.1/1.2 cluster
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-5692
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5692
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.9, 1.2.5
>            Reporter: Sergio Bossa
>            Assignee: Sergio Bossa
>            Priority: Minor
>             Fix For: 1.2.7, 2.0 beta 1
>
>         Attachments: 5692-0005.patch, 5692-0006.patch
>
>
> On a mixed 1.1 / 1.2 cluster, starting 1.2 nodes fires sometimes a race 
> condition in version detection, where the 1.2 node wrongly detects version 6 
> for a 1.1 node.
> It works as follows:
> 1) The just started 1.2 node quickly opens an OutboundTcpConnection toward a 
> 1.1 node before receiving any messages from the latter.
> 2) Given the version is correctly detected only when the first message is 
> received, the version is momentarily set at 6.
> 3) This opens an OutboundTcpConnection from 1.2 to 1.1 at version 6, which 
> gets stuck in the connect() method.
> Later, the version is correctly fixed, but all outbound connections from 1.2 
> to 1.1 are stuck at this point.
> Evidence from 1.2 logs:
> TRACE 13:48:31,133 Assuming current protocol version for /127.0.0.2
> DEBUG 13:48:37,837 Setting version 5 for /127.0.0.2



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to