adoroszlai opened a new pull request, #643:
URL: https://github.com/apache/ratis/pull/643

   ## What changes were proposed in this pull request?
   
   After 7167fafe Ozone SCM HA fails to start due to the following error in 
follower:
   
   ```
   2022-05-09 15:06:39,907 [grpc-default-executor-0] INFO  
impl.SnapshotInstallationHandler 
(SnapshotInstallationHandler.java:installSnapshot(79)) - 
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: receive 
installSnapshot: 
edb17d6c-0f2e-4b73-aa2b-eb8fdf376958->2d9f16d2-1d71-4978-9546-00aa402e881d#0-t2,notify:(t:1,
 i:0)
   2022-05-09 15:06:39,909 [grpc-default-executor-0] INFO  
server.RaftServer$Division (ServerState.java:setLeader(287)) - 
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: change Leader from 
null to edb17d6c-0f2e-4b73-aa2b-eb8fdf376958 at term 2 for installSnapshot, 
leader elected after 488ms
   2022-05-09 15:06:39,910 [grpc-default-executor-0] INFO  
impl.SnapshotInstallationHandler 
(SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(210)) - 
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: Received notification 
to install snapshot at index 0
   2022-05-09 15:06:39,914 [grpc-default-executor-0] INFO  
impl.SnapshotInstallationHandler 
(SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(243)) - 
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: notifyInstallSnapshot: 
nextIndex is 0 but the leader's first available index is 0.
   2022-05-09 15:06:39,917 [grpc-default-executor-0] ERROR 
impl.SnapshotInstallationHandler 
(SnapshotInstallationHandler.java:installSnapshot(86)) - 
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: installSnapshot failed
   java.lang.IllegalStateException: inProgressInstallSnapshotRequest: 0 is not 
eligible, firstAvailableLogIndex: 0
           at 
org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
           at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
           at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
           at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
           at 
org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
   ```
   
   Note that this patch fixes the problem, but I'm completely not sure if this 
is the right way to fix it.
   
   https://issues.apache.org/jira/browse/RATIS-1577
   
   ## How was this patch tested?
   
   ### Regular Ratis CI
   
   https://github.com/adoroszlai/incubator-ratis/actions/runs/2290702895
   
   ### Integration with Ozone
   
   Built local Ratis snapshot:
   
   ```
   ratis_hash=$(git rev-parse --short HEAD)
   ratis_version="2.3.0-${ratis_hash}-SNAPSHOT"
   mvn versions:set -DnewVersion="${ratis_version}"
   mvn -DskipTests clean install
   git reset --hard
   git clean -fd 
   ```
   
   Applied patch to Ozone (needed for compatibility with Ratis `master`):
   
   ```diff
   diff --git 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestRatisPipelineLeader.java
 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestRatisPipelineLeader.java
   index 0cb3e6553..6a0a615a3 100644
   --- 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestRatisPipelineLeader.java
   +++ 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestRatisPipelineLeader.java
   @@ -34,9 +34,6 @@
    import org.apache.ozone.test.tag.Flaky;
    
    import static 
org.apache.hadoop.ozone.OzoneConfigKeys.DFS_RATIS_LEADER_ELECTION_MINIMUM_TIMEOUT_DURATION_KEY;
   -import org.apache.log4j.Level;
   -import org.apache.log4j.Logger;
   -import org.apache.ratis.grpc.client.GrpcClientProtocolService;
    import org.apache.ratis.protocol.ClientId;
    import org.apache.ratis.protocol.GroupInfoReply;
    import org.apache.ratis.protocol.GroupInfoRequest;
   @@ -46,7 +43,9 @@
    import org.junit.jupiter.api.BeforeAll;
    import org.junit.jupiter.api.Test;
    import org.junit.jupiter.api.Timeout;
   +import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
   +import org.slf4j.event.Level;
    
    /**
     * Test pipeline leader information is correctly used.
   @@ -55,7 +54,7 @@
    public class TestRatisPipelineLeader {
      private static MiniOzoneCluster cluster;
      private static OzoneConfiguration conf;
   -  private static final org.slf4j.Logger LOG =
   +  private static final Logger LOG =
          LoggerFactory.getLogger(TestRatisPipelineLeader.class);
    
      @BeforeAll
   @@ -97,9 +96,11 @@ public void testLeaderIdUsedOnFirstCall() throws 
Exception {
        // Verify client connects to Leader without NotLeaderException
        XceiverClientRatis xceiverClientRatis =
            XceiverClientRatis.newXceiverClientRatis(ratisPipeline, conf);
   -    Logger.getLogger(GrpcClientProtocolService.class).setLevel(Level.DEBUG);
   +    final Logger log = LoggerFactory.getLogger(
   +        "org.apache.ratis.grpc.server.GrpcClientProtocolService");
   +    GenericTestUtils.setLogLevel(log, Level.DEBUG);
        GenericTestUtils.LogCapturer logCapturer =
   -        
GenericTestUtils.LogCapturer.captureLogs(GrpcClientProtocolService.LOG);
   +        GenericTestUtils.LogCapturer.captureLogs(log);
        xceiverClientRatis.connect();
        ContainerProtocolCalls.createContainer(xceiverClientRatis, 1L, null);
        logCapturer.stopCapturing();
   diff --git 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/FollowerAppendLogEntryGenerator.java
 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/FollowerAppendLogEntryGenerator.java
   index e30b778a1..07d0db800 100644
   --- 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/FollowerAppendLogEntryGenerator.java
   +++ 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/FollowerAppendLogEntryGenerator.java
   @@ -336,7 +336,7 @@ private void configureGroup() throws IOException {
                .build());
        RaftClient client = RaftClient.newBuilder()
            .setClientId(clientId)
   -        .setProperties(new RaftProperties(true))
   +        .setProperties(new RaftProperties())
            .setRaftGroup(group)
            .build();
    
   diff --git 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/LeaderAppendLogEntryGenerator.java
 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/LeaderAppendLogEntryGenerator.java
   index 971e6f7e4..8943ca1fa 100644
   --- 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/LeaderAppendLogEntryGenerator.java
   +++ 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/LeaderAppendLogEntryGenerator.java
   @@ -265,7 +265,7 @@ private void configureGroup() throws IOException {
                .build());
        RaftClient client = RaftClient.newBuilder()
            .setClientId(clientId)
   -        .setProperties(new RaftProperties(true))
   +        .setProperties(new RaftProperties())
            .setRaftGroup(group)
            .build();
    
   ```
   
   Ran Ozone test:
   
   ```
   mvn -Dskip.installnpm -Dskip.installnpx -Dskip.installyarn -Dskip.npm 
-Dskip.npx -Dskip.yarn -DskipShade \
     -am -pl :ozone-integration-test -Dsurefire.fork.timeout=120 
-DfailIfNoTests=false -Dtest=TestStorageContainerManagerHA#testAllSCMAreRunning 
\
     -Dratis.version="$ratis_version" \
     -Dratis.thirdparty.version=1.0.0 -Dgrpc.protobuf-compile.version=3.19.2 
-Dnetty.version=4.1.74.Final -Dio.grpc.version=1.44.0 
-Dtcnative.version=2.0.48.Final \
     clean test
   ```
   
   ```
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.43 
s - in org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to