[ 
https://issues.apache.org/jira/browse/RATIS-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaijie Chen updated RATIS-1769:
-------------------------------
    Description: 
This is a followup of RATIS-1762. -TransferCommand should not change priority 
of peers (or at least not by default).-

-Sadly this will break backward compatibility. But version 3.0 hasn't been 
released, so it might be OK.-

-Add a new TransferLeadershipCommand which will not change priority of peers 
when transfer leadership.-
-The old TransferCommand is deprecated and keeped as is for backward 
compatibility reasons.-

Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward 
compatibility.

If the new leader's priority is lower than the highest priority, it will first 
set the its priority to be the *same* with highest priority.
Then try to transfer leadership by the process described in RATIS-1762.
If it still gets {_}TransferLeadershipException("it does not have the highest 
priority"){_},
it will fallback to legacy mode and set the new leader's priority to {*}highest 
+ 1{*}.
h3. Example
{code:java}
$ bin/ratis sh election transfer -peers 
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1 
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
MetricRegistries implementations: class 
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found 
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address 127.0.0.1:10124
Transferring leadership initiated{code}
h3. Backward compatibility

Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
Ratis server version: {{{}2.4.1{}}}.
{code:java}
$ bin/ratis sh election transfer -peers 
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
MetricRegistries implementations: class 
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found 
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address 127.0.0.1:10124
Changing priority of peer n1 with address 127.0.0.1:10124 to 4
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Changing priority of peer n1 with address 127.0.0.1:10124 to 5
Transferring leadership initiated{code}
h3. In case of failure

In most cases, just a retry will fix the problem. And users can also set 
timeout manually by {{-timeout}} option.
{code:java}
$ bin/ratis sh election transfer -peers 
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
MetricRegistries implementations: class 
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found 
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address 127.0.0.1:10124
Failed to transfer peer n1 with address 127.0.0.1:10124: 
org.apache.ratis.protocol.exceptions.TransferLeadershipException: 
n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms): 
current leader is n2
        at 
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
        at 
org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
        at java.util.Optional.ifPresent(Optional.java:159)
        at 
org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
        at 
org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
        at 
org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
        at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
        at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
        at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
        at java.util.TimerThread.mainLoop(Timer.java:555)
        at java.util.TimerThread.run(Timer.java:505){code}

  was:
This is a followup of RATIS-1762. -TransferCommand should not change priority 
of peers (or at least not by default).-

-Sadly this will break backward compatibility. But version 3.0 hasn't been 
released, so it might be OK.-

-Add a new TransferLeadershipCommand which will not change priority of peers 
when transfer leadership.-
-The old TransferCommand is deprecated and keeped as is for backward 
compatibility reasons.-

Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward 
compatibility.

If the new leader's priority is lower than the highest priority, it will first 
set the its priority to be the *same* with highest priority.
Then try to transfer leadership by the process described in RATIS-1762.
If it still gets {_}TransferLeadershipException("it does not have the highest 
priority"){_},
it will fallback to legacy mode and set the new leader's priority to {*}highest 
+ 1{*}.
h3. Example
{code:java}
$ bin/ratis sh election transfer -peers 
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1 
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
MetricRegistries implementations: class 
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found 
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Transferring leadership initiated{code}
h3. Backward compatibility

Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
Ratis server version: {{{}2.4.1{}}}.
{code:java}
$ bin/ratis sh election transfer -peers 
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
MetricRegistries implementations: class 
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found 
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Changing priority of peer n1 with address <127.0.0.1:10124> to 4
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Changing priority of peer n1 with address <127.0.0.1:10124> to 5
Transferring leadership initiated{code}
h3. In case of failure

In most cases, just a retry will fix the problem. And users can also set 
timeout manually by {{-timeout}} option.
{code:java}
$ bin/ratis sh election transfer -peers 
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
[main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 
urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
MetricRegistries implementations: class 
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found 
implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to peer n1 with address <127.0.0.1:10124>
Failed to transfer peer n1 with address <127.0.0.1:10124>: 
org.apache.ratis.protocol.exceptions.TransferLeadershipException: 
n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms): 
current leader is n2
        at 
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
        at 
org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
        at java.util.Optional.ifPresent(Optional.java:159)
        at 
org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
        at 
org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
        at 
org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
        at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
        at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
        at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
        at java.util.TimerThread.mainLoop(Timer.java:555)
        at java.util.TimerThread.run(Timer.java:505){code}


> Avoid changing priorities in TransferCommand unless necessary
> -------------------------------------------------------------
>
>                 Key: RATIS-1769
>                 URL: https://issues.apache.org/jira/browse/RATIS-1769
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Kaijie Chen
>            Priority: Major
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This is a followup of RATIS-1762. -TransferCommand should not change priority 
> of peers (or at least not by default).-
> -Sadly this will break backward compatibility. But version 3.0 hasn't been 
> released, so it might be OK.-
> -Add a new TransferLeadershipCommand which will not change priority of peers 
> when transfer leadership.-
> -The old TransferCommand is deprecated and keeped as is for backward 
> compatibility reasons.-
> Try to avoid changing priorities before transfer leadership in 
> TransferCommand.
> It will fallback to "transfer leadership by changing priority" for backward 
> compatibility.
> If the new leader's priority is lower than the highest priority, it will 
> first set the its priority to be the *same* with highest priority.
> Then try to transfer leadership by the process described in RATIS-1762.
> If it still gets {_}TransferLeadershipException("it does not have the highest 
> priority"){_},
> it will fallback to legacy mode and set the new leader's priority to 
> {*}highest + 1{*}.
> h3. Example
> {code:java}
> $ bin/ratis sh election transfer -peers 
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1 
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
> MetricRegistries implementations: class 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first 
> found implementation: 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Transferring leadership initiated{code}
> h3. Backward compatibility
> Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
> Ratis server version: {{{}2.4.1{}}}.
> {code:java}
> $ bin/ratis sh election transfer -peers 
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
> MetricRegistries implementations: class 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first 
> found implementation: 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Changing priority of peer n1 with address 127.0.0.1:10124 to 4
> Transferring leadership to peer n1 with address <127.0.0.1:10124>
> Changing priority of peer n1 with address 127.0.0.1:10124 to 5
> Transferring leadership initiated{code}
> h3. In case of failure
> In most cases, just a retry will fix the problem. And users can also set 
> timeout manually by {{-timeout}} option.
> {code:java}
> $ bin/ratis sh election transfer -peers 
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
> MetricRegistries implementations: class 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first 
> found implementation: 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Failed to transfer peer n1 with address 127.0.0.1:10124: 
> org.apache.ratis.protocol.exceptions.TransferLeadershipException: 
> n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 
> 3000ms): current leader is n2
>       at 
> org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
>       at 
> org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
>       at java.util.Optional.ifPresent(Optional.java:159)
>       at 
> org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
>       at 
> org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
>       at 
> org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
>       at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
>       at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
>       at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
>       at java.util.TimerThread.mainLoop(Timer.java:555)
>       at java.util.TimerThread.run(Timer.java:505){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to