[
https://issues.apache.org/jira/browse/RATIS-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kaijie Chen updated RATIS-1769:
-------------------------------
Description:
This is a followup of RATIS-1762. -TransferCommand should not change priority
of peers (or at least not by default).-
-Sadly this will break backward compatibility. But version 3.0 hasn't been
released, so it might be OK.-
-Add a new TransferLeadershipCommand which will not change priority of peers
when transfer leadership.-
-The old TransferCommand is deprecated and keeped as is for backward
compatibility reasons.-
Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward
compatibility.
h3. Example
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Transferring leadership initiated{code}
h3. Backward compatibility
Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
Ratis server version: {{{}2.4.1{}}}.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Changing priority of peer n1 with address <127.0.0.1:10124> to 4
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Changing priority of peer n1 with address <127.0.0.1:10124> to 5
Transferring leadership initiated{code}
h3. In case of failure
In most cases, just a retry will fix the problem. And users can also set
timeout manually by {{-timeout}} option.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Failed to transfer peer n1 with address <127.0.0.1:10124>:
org.apache.ratis.protocol.exceptions.TransferLeadershipException:
n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms):
current leader is n2
at
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
at java.util.Optional.ifPresent(Optional.java:159)
at
org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
at
org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505){code}
was:
This is a followup of RATIS-1762. -TransferCommand should not change priority
of peers (or at least not by default).-
-Sadly this will break backward compatibility. But version 3.0 hasn't been
released, so it might be OK.-
-Add a new TransferLeadershipCommand which will not change priority of peers
when transfer leadership.-
-The old TransferCommand is deprecated and keeped as is for backward
compatibility reasons.-
Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward
compatibility.
h3. Example
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124[main]
INFO org.reflections.Reflections - Reflections took 157 ms to scan 1 urls,
producing 5 keys and 18 values[main] WARN
org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
found implementation:
org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
leadership to server with address <127.0.0.1:10124> Transferring leadership
initiated{code}
h3. Backward compatibility
Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
Ratis server version: {{{}2.4.1{}}}.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124[main]
INFO org.reflections.Reflections - Reflections took 131 ms to scan 1 urls,
producing 5 keys and 18 values[main] WARN
org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
found implementation:
org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
leadership to server with address <127.0.0.1:10124> Changing priority of
<127.0.0.1:10124> to 2: Transferring leadership to server with address
<127.0.0.1:10124> Changing priority of <127.0.0.1:10124> to 3: Transferring
leadership initiated{code}
h3. In case of failure
In most cases, just a retry will fix the problem. And users can also set
timeout manually by {{-timeout}} option.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124[main]
INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 urls,
producing 5 keys and 18 values[main] WARN
org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
found implementation:
org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
leadership to server with address <127.0.0.1:10124> Failed to transfer peer n1
with address 127.0.0.1:10124:
org.apache.ratis.protocol.exceptions.TransferLeadershipException:
n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms):
current leader is n2 at
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
at java.util.Optional.ifPresent(Optional.java:159) at
org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
at
org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101) at
org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38) at
org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79) at
org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55) at
java.util.TimerThread.mainLoop(Timer.java:555) at
java.util.TimerThread.run(Timer.java:505){code}
> Avoid changing priorities in TransferCommand unless necessary
> -------------------------------------------------------------
>
> Key: RATIS-1769
> URL: https://issues.apache.org/jira/browse/RATIS-1769
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Kaijie Chen
> Priority: Major
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> This is a followup of RATIS-1762. -TransferCommand should not change priority
> of peers (or at least not by default).-
> -Sadly this will break backward compatibility. But version 3.0 hasn't been
> released, so it might be OK.-
> -Add a new TransferLeadershipCommand which will not change priority of peers
> when transfer leadership.-
> -The old TransferCommand is deprecated and keeped as is for backward
> compatibility reasons.-
> Try to avoid changing priorities before transfer leadership in
> TransferCommand.
> It will fallback to "transfer leadership by changing priority" for backward
> compatibility.
> h3. Example
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
> MetricRegistries implementations: class
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
> found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address <127.0.0.1:10124>
> Transferring leadership initiated{code}
> h3. Backward compatibility
> Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
> Ratis server version: {{{}2.4.1{}}}.
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
> MetricRegistries implementations: class
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
> found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address <127.0.0.1:10124>
> Changing priority of peer n1 with address <127.0.0.1:10124> to 4
> Transferring leadership to peer n1 with address <127.0.0.1:10124>
> Changing priority of peer n1 with address <127.0.0.1:10124> to 5
> Transferring leadership initiated{code}
> h3. In case of failure
> In most cases, just a retry will fix the problem. And users can also set
> timeout manually by {{-timeout}} option.
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
> MetricRegistries implementations: class
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
> found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address <127.0.0.1:10124>
> Failed to transfer peer n1 with address <127.0.0.1:10124>:
> org.apache.ratis.protocol.exceptions.TransferLeadershipException:
> n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out
> 3000ms): current leader is n2
> at
> org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
> at
> org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
> at java.util.Optional.ifPresent(Optional.java:159)
> at
> org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
> at
> org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
> at
> org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
> at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
> at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)