[ 
https://issues.apache.org/jira/browse/RATIS-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated RATIS-1769:
------------------------------------
    Fix Version/s: 2.5.0

> Avoid changing priorities in TransferCommand unless necessary
> -------------------------------------------------------------
>
>                 Key: RATIS-1769
>                 URL: https://issues.apache.org/jira/browse/RATIS-1769
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Kaijie Chen
>            Assignee: Kaijie Chen
>            Priority: Major
>             Fix For: 3.0.0, 2.5.0
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a followup of RATIS-1762. -TransferCommand should not change priority 
> of peers (or at least not by default).-
> -Sadly this will break backward compatibility. But version 3.0 hasn't been 
> released, so it might be OK.-
> -Add a new TransferLeadershipCommand which will not change priority of peers 
> when transfer leadership.-
> -The old TransferCommand is deprecated and keeped as is for backward 
> compatibility reasons.-
> Try to avoid changing priorities before transfer leadership in 
> TransferCommand.
> It will fallback to "transfer leadership by changing priority" for backward 
> compatibility.
> If the new leader's priority is lower than the highest priority, it will 
> first set the its priority to be the *same* with highest priority.
> Then try to transfer leadership by the process described in RATIS-1762.
> If it still gets {_}TransferLeadershipException("it does not have the highest 
> priority"){_},
> it will fallback to legacy mode and set the new leader's priority to 
> {*}highest + 1{*}.
> h3. Example
> {code:java}
> $ bin/ratis sh election transfer -peers 
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1 
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
> MetricRegistries implementations: class 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first 
> found implementation: 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Transferring leadership initiated{code}
> h3. Backward compatibility
> Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
> Ratis server version: {{{}2.4.1{}}}.
> {code:java}
> $ bin/ratis sh election transfer -peers 
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
> MetricRegistries implementations: class 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first 
> found implementation: 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Changing priority of peer n1 with address 127.0.0.1:10124 to 4
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Changing priority of peer n1 with address 127.0.0.1:10124 to 5
> Transferring leadership initiated{code}
> h3. In case of failure
> In most cases, just a retry will fix the problem. And users can also set 
> timeout manually by {{-timeout}} option.
> {code:java}
> $ bin/ratis sh election transfer -peers 
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
> [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 
> urls, producing 5 keys and 18 values
> [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple 
> MetricRegistries implementations: class 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class 
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first 
> found implementation: 
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
> Transferring leadership to peer n1 with address 127.0.0.1:10124
> Failed to transfer peer n1 with address 127.0.0.1:10124: 
> org.apache.ratis.protocol.exceptions.TransferLeadershipException: 
> n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 
> 3000ms): current leader is n2
>       at 
> org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
>       at 
> org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
>       at java.util.Optional.ifPresent(Optional.java:159)
>       at 
> org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
>       at 
> org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
>       at 
> org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
>       at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
>       at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
>       at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
>       at java.util.TimerThread.mainLoop(Timer.java:555)
>       at java.util.TimerThread.run(Timer.java:505){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to