[jira] [Updated] (CASSANDRA-3721) Staggering repair

Sylvain Lebresne (Updated) (JIRA) Tue, 14 Feb 2012 03:55:31 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-3721:
----------------------------------------

    Attachment: 3721.patch

Looking at the global patch for this, there's a few things I'm not totally fan 
with the DistributedJob approach:
* I think it tries to generalize too much, making DistributedJob hard to follow 
in itself. Typically, why wouldn't the parallel case of DJ not use the 
initRequest method? Yes, technically that's because it is used to do send 
snapshot commands for RepairJob only in the sequential case, but that makes for 
a poor abstraction imho. Another "proof" of that is the fact that 
DifferencingJob actually don't use half of the features DJ is trying to 
abstract.
* As said earlier, it changes more code that we really need to, including 
changing completely how repair synchronization is done. Given that I'm not sure 
it really improves things, I'd prefer avoiding that if only for the sake of 
having less chance to introducing bugs.
* I believe the differences between the sequential and parallel path would be 
easier to follow using sub-classing. That may be a personal preference though.

Attaching a version that tries to abstract the sequential vs parallel request 
business but only that. The rest of the patch is roughly the same as the 
initial patch except that it's rebased.

                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 
> 0001-staggering-repair-with-snapshot.patch, 3721.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data 
> to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of 
> the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release 
> references).
> 3) Hold the reference of the tree in the requesting node and once everything 
> is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in 
> the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3721) Staggering repair

Reply via email to