[
https://issues.apache.org/jira/browse/CASSANDRA-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790575#action_12790575
]
Jaakko Laine commented on CASSANDRA-603:
----------------------------------------
Unfortunately it doesn't quite work that way :)
First the case of leaving nodes:
Problem with current implementation is that pending ranges is calculated only
once at the time of leaving. Suppose there is a ring of nodes A, B, C, D and E
with replication factor 2. Ring status is this:
(primary, replica)
E-A, D-E
A-B, E-A
B-C, A-B
C-D, B-C
D-E, C-D
Suppose C prepares to leave. After hearing STATE_LEAVING from C, ring status
will be:
(primary, replica, pending)
E-A, D-E
A-B, E-A
B-C, A-B
C-D, B-C, A-B
D-E, C-D, B-C
Now suppose also B leaves. After receiving STATE_LEAVING, ring status with
current implementation will be:
E-A, D-E
A-B, E-A
B-C, A-B, E-A
C-D, B-C, A-B
D-E, C-D, B-C
This is clearly wrong, as (1) E-A is being streamed to C, even though it is
leaving and (2) D is not getting this range, even if it is supposed to.
In order to do this right, we will need to know at all times what nodes are
leaving and calculate ranges accordingly. An anonymous pending ranges list is
not enough, as that does not tell which node is leaving and/or if the ranges
are there because of bootstrap or leave operation.
As for bootstrapping and pending range collision:
Suppose that there is a ring of nodes A, C and E, with replication factor 3.
Node D bootstraps between C and E, so its pending ranges will be E-A, A-C and
C-D. Now suppose node B bootstraps between A and C at the same time. Its
pending ranges would be C-E, E-A and A-B. Now both nodes have pending range E-A
in their list, which will cause pending range collision even though we're only
talking about replica range, not even primary range. The same thing happens for
any nodes that boot simultaneously between same two nodes. For this we cannot
simply make pending ranges a multimap, since that would make us unable to
notice the real problem of two nodes trying to boot using the same token. In
order to do this properly, we need to know what tokens are booting at any time.
> pending range collision between nodes
> -------------------------------------
>
> Key: CASSANDRA-603
> URL: https://issues.apache.org/jira/browse/CASSANDRA-603
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.5
> Reporter: Chris Goffinet
> Fix For: 0.5
>
> Attachments: 603.patch
>
>
> We bootstrapped 5 nodes on the east coast from an existing cluster (5) on
> west. We waited at least 60 seconds before starting up each node so it would
> start bootstrapping. We started seeing these types of errors:
> INFO [GMFD:1] 2009-12-04 01:45:42,065 Gossiper.java (line 568) Node
> /X.X.X.140 has now joined.
> ERROR [GMFD:1] 2009-12-04 01:46:14,371 DebuggableThreadPoolExecutor.java
> (line 127) Error in ThreadPoolExecutor
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and
> /X.X.X.140
> at
> org.apache.cassandra.locator.TokenMetadata.addPendingRange(TokenMetadata.java:242)
> at
> org.apache.cassandra.service.StorageService.updateBootstrapRanges(StorageService.java:481)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:402)
> at
> org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692)
> at
> org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657)
> at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:610)
> at
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:978)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> ERROR [GMFD:1] 2009-12-04 01:46:14,378 CassandraDaemon.java (line 71) Fatal
> exception in thread Thread[GMFD:1,5,main]
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and
> /X.X.X.140
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and
> /X.X.X.140
> at
> org.apache.cassandra.locator.TokenMetadata.addPendingRange(TokenMetadata.java:242)
> at
> org.apache.cassandra.service.StorageService.updateBootstrapRanges(StorageService.java:481)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:402)
> at
> org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692)
> at
> org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657)
> at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:610)
> at
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:978)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.