[
https://issues.apache.org/jira/browse/RATIS-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kaijie Chen updated RATIS-1769:
-------------------------------
Description:
This is a followup of RATIS-1762. -TransferCommand should not change priority
of peers (or at least not by default).-
-Sadly this will break backward compatibility. But version 3.0 hasn't been
released, so it might be OK.-
-Add a new TransferLeadershipCommand which will not change priority of peers
when transfer leadership.-
-The old TransferCommand is deprecated and keeped as is for backward
compatibility reasons.-
Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward
compatibility.
If the new leader's priority is lower than the highest priority, it will first
set the its priority to be the *same* with highest priority.
Then try to transfer leadership by the process described in RATIS-1762.
If it still gets {_}TransferLeadershipException("it does not have the highest
priority"){_},
it will fallback to legacy mode and set the new leader's priority to {*}highest
+ 1{*}.
h3. Example
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address 127.0.0.1:10124
Transferring leadership initiated{code}
h3. Backward compatibility
Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
Ratis server version: {{{}2.4.1{}}}.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address 127.0.0.1:10124
Changing priority of peer n1 with address 127.0.0.1:10124 to 4
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Changing priority of peer n1 with address 127.0.0.1:10124 to 5
Transferring leadership initiated{code}
h3. In case of failure
In most cases, just a retry will fix the problem. And users can also set
timeout manually by {{-timeout}} option.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address 127.0.0.1:10124
Failed to transfer peer n1 with address 127.0.0.1:10124:
org.apache.ratis.protocol.exceptions.TransferLeadershipException:
n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms):
current leader is n2
at
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
at java.util.Optional.ifPresent(Optional.java:159)
at
org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
at
org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505){code}
was:
This is a followup of RATIS-1762. -TransferCommand should not change priority
of peers (or at least not by default).-
-Sadly this will break backward compatibility. But version 3.0 hasn't been
released, so it might be OK.-
-Add a new TransferLeadershipCommand which will not change priority of peers
when transfer leadership.-
-The old TransferCommand is deprecated and keeped as is for backward
compatibility reasons.-
Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward
compatibility.
If the new leader's priority is lower than the highest priority, it will first
set the its priority to be the *same* with highest priority.
Then try to transfer leadership by the process described in RATIS-1762.
If it still gets {_}TransferLeadershipException("it does not have the highest
priority"){_},
it will fallback to legacy mode and set the new leader's priority to {*}highest
+ 1{*}.
h3. Example
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Transferring leadership initiated{code}
h3. Backward compatibility
Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
Ratis server version: {{{}2.4.1{}}}.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Changing priority of peer n1 with address <127.0.0.1:10124> to 4
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Changing priority of peer n1 with address <127.0.0.1:10124> to 5
Transferring leadership initiated{code}
h3. In case of failure
In most cases, just a retry will fix the problem. And users can also set
timeout manually by {{-timeout}} option.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
MetricRegistries implementations: class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Failed to transfer peer n1 with address <127.0.0.1:10124>:
org.apache.ratis.protocol.exceptions.TransferLeadershipException:
n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms):
current leader is n2
at
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
at java.util.Optional.ifPresent(Optional.java:159)
at
org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
at
org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505){code}
> Avoid changing priorities in TransferCommand unless necessary
> -------------------------------------------------------------
>
> Key: RATIS-1769
> URL: https://issues.apache.org/jira/browse/RATIS-1769
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Kaijie Chen
> Priority: Major
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> This is a followup of RATIS-1762. -TransferCommand should not change priority
> of peers (or at least not by default).-
> -Sadly this will break backward compatibility. But version 3.0 hasn't been
> released, so it might be OK.-
> -Add a new TransferLeadershipCommand which will not change priority of peers
> when transfer leadership.-
> -The old TransferCommand is deprecated and keeped as is for backward
> compatibility reasons.-
> Try to avoid changing priorities before transfer leadership in
> TransferCommand.
> It will fallback to "transfer leadership by changing priority" for backward
> compatibility.
> If the new leader's priority is lower than the highest priority, it will
> first set the its priority to be the *same* with highest priority.
> Then try to transfer leadership by the process described in RATIS-1762.
> If it still gets {_}TransferLeadershipException("it does not have the highest
> priority"){_},
> it will fallback to legacy mode and set the new leader's priority to
> {*}highest + 1{*}.
> h3. Example
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
> MetricRegistries implementations: class
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
> found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Transferring leadership initiated{code}
> h3. Backward compatibility
> Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
> Ratis server version: {{{}2.4.1{}}}.
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
> MetricRegistries implementations: class
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
> found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Changing priority of peer n1 with address 127.0.0.1:10124 to 4
> Transferring leadership to peer n1 with address <127.0.0.1:10124>
> Changing priority of peer n1 with address 127.0.0.1:10124 to 5
> Transferring leadership initiated{code}
> h3. In case of failure
> In most cases, just a retry will fix the problem. And users can also set
> timeout manually by {{-timeout}} option.
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple
> MetricRegistries implementations: class
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
> found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Failed to transfer peer n1 with address 127.0.0.1:10124:
> org.apache.ratis.protocol.exceptions.TransferLeadershipException:
> n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out
> 3000ms): current leader is n2
> at
> org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
> at
> org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
> at java.util.Optional.ifPresent(Optional.java:159)
> at
> org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
> at
> org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
> at
> org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
> at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
> at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)