[
https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415478#comment-15415478
]
Paulo Motta commented on CASSANDRA-9876:
----------------------------------------
Thanks for the follow-up. Updated patch and dtests LGTM.
bq. The reason why I added in the check for a token range was that the repair
code as it is now doesn’t actually add only the common ranges between the
specified hosts. I wasn’t sure if this is was the intended behavior or a bug.
You're right, thanks for pointing this out. I was having {{-pr}} option in
mind, but it seems like it's not possible to combine {{-pr}} and {{-hosts}}
since CASSANDRA-7317. As a matter of fact this limitation was discussed on
parent ticket CASSANDRA-6440, and it seems like it's expected behavior.
bq. If this is intended behavior, then forcing the user to specify a token
range that is common between the nodes prevents that exception from being
thrown. Otherwise the error message, “Repair requires at least two endpoints
that are neighbours before it can continue” can be confusing to the operator
since the two specified nodes may actually share a common range.
Agreed, in any case I updated the error message to the following to make it
clearer when {{--pull}} is not specified:
{noformat}
Specified hosts [127.0.0.2, 127.0.0.1] do not share range
(-3074457345618258503,3074457345618258602] needed for repair. Either restrict
repair ranges with -st/-et options, or specify one of the neighbors that share
this range with this node: [/127.0.0.3, /127.0.0.4, /127.0.0.6].
{noformat}
When trying to reproduce this, I noticed two minor problems with the repair
command so I included 2 ninja commits to fix those (could you have a look?):
1. When there is an exception while running repair, the
[RepairRunner|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/RepairRunner.java#L108]
prints a {{\[2016-08-10 09:16:41,291\] null}} message after the actual error
message due to {{RepairRunnable}} not including any message in the {{COMPLETE}}
event on
[fireErrorAndComplete|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L106],
so I added a {{Repair command #x finished with error}} message to avoid null
being print when there is an error during repair.
2. Currently {{\-\-dc}} and {{\-\-hosts}} option are mutually exclusive on
[ActiveRepairService.getNeighbors|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L226],
but if you specify them together the {{--hosts}} option is silently ignored,
so I added a minor check to avoid combining this two options.
Update branch and CI submissions links are below:
||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-9876]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9876-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-9876-dtest/lastCompletedBuild/testReport/]|
After CI results look good and you verified the additional changes I will mark
this as ready to commit. Can you open a dtest pull request to
https://github.com/riptano/cassandra-dtest ?
> One way targeted repair
> -----------------------
>
> Key: CASSANDRA-9876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9876
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Geoffrey Yu
> Priority: Minor
> Fix For: 3.x
>
> Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt
>
>
> Many applications use C* by writing to one local DC. The other DC is used
> when the local DC is unavailable. When the local DC becomes available, we
> want to run a targeted repair b/w one endpoint from each DC to minimize the
> data transfer over WAN. In this case, it will be helpful to do a one way
> repair in which data will only be streamed from other DC to local DC instead
> of streaming the data both ways. This will further minimize the traffic over
> WAN. This feature should only be supported if a targeted repair is run
> involving 2 hosts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)