[
https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Geoffrey Yu updated CASSANDRA-9876:
-----------------------------------
Attachment: 9876-dtest-master.txt
9876-trunk-v2.txt
Thanks for the quick review! I’ve attached a new patch that addresses your
comments, with the exception of one of them for which I wanted to get some more
feedback first.
I also attached a patch that adds one dtest to test the pull repair. It works
nearly identically to the token range repair with the exception that it asserts
that one of the nodes only sends data and the other only receives.
{quote}
I don't think it's necessary to make specifying --start-token and --end-token
mandatory, since if that is not specified it will just pull repair all common
ranges between specified hosts.
{quote}
The reason why I added in the check for a token range was that the repair code
as it is now doesn’t actually add only the common ranges between the specified
hosts. I wasn’t sure if this is was the intended behavior or a bug.
To replicate the issue, just create a 3 node cluster, add a keyspace with
replication factor 2, and run a regular repair through nodetool on that
keyspace with exactly two nodes specified.
The reason it happens is that if no ranges are specified, the repair will [add
all ranges on the local
node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137].
Then when we hit {{RepairRunnable}}, we try to find a list of neighbors for
each range
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162).
The problem here is that it isn’t always true that every range the local node
owns is also owned by the remote node we specified through the nodetool
command. In the example above, only one range will be common between any two
nodes. Because of this the [check
here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251]
may result in an exception being thrown, which aborts the repair.
If this is intended behavior, then forcing the user to specify a token range
that is common between the nodes prevents that exception from being thrown.
Otherwise the error message, “Repair requires at least two endpoints that are
neighbours before it can continue” can be confusing to the operator since the
two specified nodes may actually share a common range. What do you think?
> One way targeted repair
> -----------------------
>
> Key: CASSANDRA-9876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9876
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Geoffrey Yu
> Priority: Minor
> Fix For: 3.x
>
> Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt
>
>
> Many applications use C* by writing to one local DC. The other DC is used
> when the local DC is unavailable. When the local DC becomes available, we
> want to run a targeted repair b/w one endpoint from each DC to minimize the
> data transfer over WAN. In this case, it will be helpful to do a one way
> repair in which data will only be streamed from other DC to local DC instead
> of streaming the data both ways. This will further minimize the traffic over
> WAN. This feature should only be supported if a targeted repair is run
> involving 2 hosts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)