[jira] [Commented] (CASSANDRA-2433) Failed Streams Break Repair

Jonathan Ellis (JIRA) Tue, 30 Aug 2011 09:46:01 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093881#comment-13093881
 ]


Jonathan Ellis commented on CASSANDRA-2433:
-------------------------------------------

bq. it's probably not worth a stage, not even the jmx enabledness maybe

Someone's probably going to want the JMX information but let's keep Stages for 
Verb-associated tasks.

bq. the problem is that we must deal with the case of a node restarting before 
it has been convicted (especially if the conviction threshold is higher), which 
the FD won't see

How about splitting onDead and onRestart in EndpointStateChange, then?  Then RS 
could implement convict and onRestart (ignoring onDead); other ESCS listeners 
could implement onRestart == onDead.  That would maintain the "ESCS is about 
events, FDEL is low-level convict information" separation of roles.

> Failed Streams Break Repair
> ---------------------------
>
>                 Key: CASSANDRA-2433
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benjamin Coverston
>            Assignee: Sylvain Lebresne
>              Labels: repair
>             Fix For: 0.8.5
>
>         Attachments: 
> 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v4.patch, 
> 0002-Register-in-gossip-to-handle-node-failures-v4.patch, 
> 0003-Report-streaming-errors-back-to-repair-v4.patch, 
> 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch, 
> 2433.patch, 2433_v2.patch
>
>
> Running repair in cases where a stream fails we are seeing multiple problems.
> 1. Although retry is initiated and completes, the old stream doesn't seem to 
> clean itself up and repair hangs.
> 2. The temp files are left behind and multiple failures can end up filling up 
> the data partition.
> These issues together are making repair very difficult for nearly everyone 
> running repair on a non-trivial sized data set.
> This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
> moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
> that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2433) Failed Streams Break Repair

Reply via email to