[
https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-3721:
----------------------------------------
Attachment: 3721.patch
Looking at the global patch for this, there's a few things I'm not totally fan
with the DistributedJob approach:
* I think it tries to generalize too much, making DistributedJob hard to follow
in itself. Typically, why wouldn't the parallel case of DJ not use the
initRequest method? Yes, technically that's because it is used to do send
snapshot commands for RepairJob only in the sequential case, but that makes for
a poor abstraction imho. Another "proof" of that is the fact that
DifferencingJob actually don't use half of the features DJ is trying to
abstract.
* As said earlier, it changes more code that we really need to, including
changing completely how repair synchronization is done. Given that I'm not sure
it really improves things, I'd prefer avoiding that if only for the sake of
having less chance to introducing bugs.
* I believe the differences between the sequential and parallel path would be
easier to follow using sub-classing. That may be a personal preference though.
Attaching a version that tries to abstract the sequential vs parallel request
business but only that. The rest of the patch is roughly the same as the
initial patch except that it's rebased.
> Staggering repair
> -----------------
>
> Key: CASSANDRA-3721
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.1.0
> Reporter: Vijay
> Assignee: Vijay
> Priority: Minor
> Fix For: 1.1.1
>
> Attachments: 0001-add-snapshot-command.patch,
> 0001-staggering-repair-with-snapshot.patch, 3721.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data
> to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of
> the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release
> references).
> 3) Hold the reference of the tree in the requesting node and once everything
> is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in
> the streaming.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira