[ 
https://issues.apache.org/jira/browse/CASSANDRA-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787452#action_12787452
 ] 

Jaakko Laine commented on CASSANDRA-603:
----------------------------------------

As for the fix, there are two (at least) two options I think:

(1) Add a list of pending primary ranges (or tokens) to token metadata. 
Currently primary and replica pending ranges are all in one list, so there is 
no way to check afterwards if primary ranges collide.

(2) Ditch pending ranges completely and convert it to pending tokens. Problem 
with pending ranges is that it is static structure (determined at the time of 
bootstrap/leaving) and does not react to token changes during the operation. 
This introduces a number of difficult-to-prove-that-it-works-correctly and 
difficult-to-handle-correctly corner cases regarding node movement as proved by 
various mail and JIRA discussions recently. If we had a list of pending tokens 
instead, it would adapt to any changes that happen during the move operation. 
There are currently issues in pending range handling (not cleaned up correctly 
in all cases, thread/atomicy issues, leaving coordination, etc) that would 
mostly go away if we swiched to pending tokens instead, I think. Might be that 
I'm overlooking something obvious here, but to me it seems like dynamically 
adapting pending token list would be more suitable for this.


> pending range collision between nodes
> -------------------------------------
>
>                 Key: CASSANDRA-603
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-603
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Chris Goffinet
>             Fix For: 0.5
>
>
> We bootstrapped 5 nodes on the east coast from an existing cluster (5) on 
> west. We waited at least 60 seconds before starting up each node so it would 
> start bootstrapping. We started seeing these types of errors:
>  INFO [GMFD:1] 2009-12-04 01:45:42,065 Gossiper.java (line 568) Node 
> /X.X.X.140 has now joined.
> ERROR [GMFD:1] 2009-12-04 01:46:14,371 DebuggableThreadPoolExecutor.java 
> (line 127) Error in ThreadPoolExecutor
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and 
> /X.X.X.140
>         at 
> org.apache.cassandra.locator.TokenMetadata.addPendingRange(TokenMetadata.java:242)
>         at 
> org.apache.cassandra.service.StorageService.updateBootstrapRanges(StorageService.java:481)
>         at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:402)
>         at 
> org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692)
>         at 
> org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657)
>         at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:610)
>         at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:978)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> ERROR [GMFD:1] 2009-12-04 01:46:14,378 CassandraDaemon.java (line 71) Fatal 
> exception in thread Thread[GMFD:1,5,main]   
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and 
> /X.X.X.140
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and 
> /X.X.X.140
>         at 
> org.apache.cassandra.locator.TokenMetadata.addPendingRange(TokenMetadata.java:242)
>         at 
> org.apache.cassandra.service.StorageService.updateBootstrapRanges(StorageService.java:481)
>         at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:402)
>         at 
> org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692)
>         at 
> org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657)
>         at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:610)
>         at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:978)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to