[jira] [Commented] (CASSANDRA-6156) Poor resilience and recovery for bootstrapping node - "unable to fetch range"

Jeremiah Jordan (JIRA) Mon, 28 Oct 2013 14:35:21 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807255#comment-13807255
 ]


Jeremiah Jordan commented on CASSANDRA-6156:
--------------------------------------------

Maybe some kind of "re-booststrap timeout" where the source node doesn't drop 
the references for a certain amount of time, and it adds any new files when the 
new re-bootstrap starts?  It seems like a tricky thing to get right so that you 
don't miss data.  And depending on what happened, you might resend even more 
data than you would have for a fresh bootstrap, since if a large sstable 
compacted, you would have the old version and the new version of it to send.  
And you can't just send the new file, because you might have thrown out some 
tombstones after compacting, and those tombstones could be for data already on 
the node being bootstrapped, so you need those...  It gets hairy real quick.

> Poor resilience and recovery for bootstrapping node - "unable to fetch range"
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6156
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6156
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Alyssa Kwan
>            Priority: Minor
>
> We have an 8 node cluster on 1.2.8 using vnodes.  One of our nodes failed and 
> we are having lots of trouble bootstrapping it back.  On each attempt, 
> bootstrapping eventually fails with a RuntimeException "Unable to fetch 
> range".  As far as we can tell, long GC pauses on the sender side cause 
> heartbeat drops or delays, which leads the gossip controller to convict the 
> connection and mark the sender dead.  We've done significant GC tuning to 
> minimize the duration of pauses and raised phi_convict to its max.  It merely 
> lets the bootstrap process take longer to fail.
> The inability to reliably add nodes significantly affects our ability to 
> scale.
> We're not the only ones:  
> http://stackoverflow.com/questions/19199349/cassandra-bootstrap-fails-with-unable-to-fetch-range
> What can we do in the immediate term to bring this node in?  And what's the 
> long term solution?
> One possible solution would be to allow bootstrapping to be an incremental 
> process with individual transfers of vnode ownership instead of attempting to 
> transfer the whole set of vnodes transactionally.  (I assume that's what's 
> happening now.)  I don't know what would have to change on the gossip and 
> token-aware client side to support this.
> Another solution would be to partition sstable files by vnode and allow 
> transfer of those files directly with some sort of checkpointing of and 
> incremental transfer of writes after the sstable is transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6156) Poor resilience and recovery for bootstrapping node - "unable to fetch range"

Reply via email to