[jira] Updated: (CASSANDRA-742) write operation will throw internal error if the bootstrapping node is down

david.pan (JIRA) Tue, 26 Jan 2010 02:06:58 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


david.pan updated CASSANDRA-742:
--------------------------------

    Attachment: 742-write_failed_when_bootstrapping_down.patch

This patch is not a perfect solution for this issue, but I can have a sweet 
dream at night and I can deal with this accident the next morning.  :-)

This patch will remove the bootstrapping endpoint from the tokenMetadata if 
other nodes find this node is down.
The write opertion will be timeout before other nodes find the bootstrapping 
node is down, but it will be OK after other nodes remove the bootstrapping node 
from the pendingRanges.

> write operation will throw internal error if the bootstrapping node is down
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-742
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: linux2.6
>            Reporter: david.pan
>             Fix For: 0.6
>
>         Attachments: 742-write_failed_when_bootstrapping_down.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> the opertions are that :
> 1) bootstrap a node A;
> 2) keep on inserting data while bootstrapping;
> 3) stop the service of the node A;
> 4) then the following exception was found:
> ERROR [pool-1-thread-9] 2010-01-26 10:32:39,688 Cassandra.java (line 1064) 
> Internal error processing insert
> java.lang.AssertionError
> at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:213)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:142)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
> at 
> org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1188)
> at 
> org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
> at 
> org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
> at 
> org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:417)
> at 
> org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:1056)
> at 
> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> I traced the code and found that 
> "org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(Collection<InetAddress>)"
>  will select a hinted endpoint for a dead endpoint, no mater whether it's a 
> normal node or a bootstrapping node. To get the tokenID of the endpoint, this 
> method will call "tokenMetadata_.getToken(ep);", but getToken() asserts that 
> the endpoint should be  a member of the ring only. Of course, the 
> bootstrapping endpoint is not a member and a internal exception is throwed 
> out.
> This exception will always be throwed out until I re-boostrapping. This is 
> really a big prolem for me, because the bootstrapping will last  30 hours and 
> my machines are not very durable. I have to get up from bed at night to deal 
> with this accident. :-(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-742) write operation will throw internal error if the bootstrapping node is down

Reply via email to