adoroszlai opened a new pull request, #643:
URL: https://github.com/apache/ratis/pull/643
## What changes were proposed in this pull request?
After 7167fafe Ozone SCM HA fails to start due to the following error in
follower:
```
2022-05-09 15:06:39,907 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler
(SnapshotInstallationHandler.java:installSnapshot(79)) -
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: receive
installSnapshot:
edb17d6c-0f2e-4b73-aa2b-eb8fdf376958->2d9f16d2-1d71-4978-9546-00aa402e881d#0-t2,notify:(t:1,
i:0)
2022-05-09 15:06:39,909 [grpc-default-executor-0] INFO
server.RaftServer$Division (ServerState.java:setLeader(287)) -
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: change Leader from
null to edb17d6c-0f2e-4b73-aa2b-eb8fdf376958 at term 2 for installSnapshot,
leader elected after 488ms
2022-05-09 15:06:39,910 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler
(SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(210)) -
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: Received notification
to install snapshot at index 0
2022-05-09 15:06:39,914 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler
(SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(243)) -
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: notifyInstallSnapshot:
nextIndex is 0 but the leader's first available index is 0.
2022-05-09 15:06:39,917 [grpc-default-executor-0] ERROR
impl.SnapshotInstallationHandler
(SnapshotInstallationHandler.java:installSnapshot(86)) -
2d9f16d2-1d71-4978-9546-00aa402e881d@group-F84A6C219907: installSnapshot failed
java.lang.IllegalStateException: inProgressInstallSnapshotRequest: 0 is not
eligible, firstAvailableLogIndex: 0
at
org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
at
org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
at
org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
```
Note that this patch fixes the problem, but I'm completely not sure if this
is the right way to fix it.
https://issues.apache.org/jira/browse/RATIS-1577
## How was this patch tested?
### Regular Ratis CI
https://github.com/adoroszlai/incubator-ratis/actions/runs/2290702895
### Integration with Ozone
Built local Ratis snapshot:
```
ratis_hash=$(git rev-parse --short HEAD)
ratis_version="2.3.0-${ratis_hash}-SNAPSHOT"
mvn versions:set -DnewVersion="${ratis_version}"
mvn -DskipTests clean install
git reset --hard
git clean -fd
```
Applied patch to Ozone (needed for compatibility with Ratis `master`):
```diff
diff --git
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestRatisPipelineLeader.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestRatisPipelineLeader.java
index 0cb3e6553..6a0a615a3 100644
---
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestRatisPipelineLeader.java
+++
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestRatisPipelineLeader.java
@@ -34,9 +34,6 @@
import org.apache.ozone.test.tag.Flaky;
import static
org.apache.hadoop.ozone.OzoneConfigKeys.DFS_RATIS_LEADER_ELECTION_MINIMUM_TIMEOUT_DURATION_KEY;
-import org.apache.log4j.Level;
-import org.apache.log4j.Logger;
-import org.apache.ratis.grpc.client.GrpcClientProtocolService;
import org.apache.ratis.protocol.ClientId;
import org.apache.ratis.protocol.GroupInfoReply;
import org.apache.ratis.protocol.GroupInfoRequest;
@@ -46,7 +43,9 @@
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Timeout;
+import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
+import org.slf4j.event.Level;
/**
* Test pipeline leader information is correctly used.
@@ -55,7 +54,7 @@
public class TestRatisPipelineLeader {
private static MiniOzoneCluster cluster;
private static OzoneConfiguration conf;
- private static final org.slf4j.Logger LOG =
+ private static final Logger LOG =
LoggerFactory.getLogger(TestRatisPipelineLeader.class);
@BeforeAll
@@ -97,9 +96,11 @@ public void testLeaderIdUsedOnFirstCall() throws
Exception {
// Verify client connects to Leader without NotLeaderException
XceiverClientRatis xceiverClientRatis =
XceiverClientRatis.newXceiverClientRatis(ratisPipeline, conf);
- Logger.getLogger(GrpcClientProtocolService.class).setLevel(Level.DEBUG);
+ final Logger log = LoggerFactory.getLogger(
+ "org.apache.ratis.grpc.server.GrpcClientProtocolService");
+ GenericTestUtils.setLogLevel(log, Level.DEBUG);
GenericTestUtils.LogCapturer logCapturer =
-
GenericTestUtils.LogCapturer.captureLogs(GrpcClientProtocolService.LOG);
+ GenericTestUtils.LogCapturer.captureLogs(log);
xceiverClientRatis.connect();
ContainerProtocolCalls.createContainer(xceiverClientRatis, 1L, null);
logCapturer.stopCapturing();
diff --git
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/FollowerAppendLogEntryGenerator.java
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/FollowerAppendLogEntryGenerator.java
index e30b778a1..07d0db800 100644
---
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/FollowerAppendLogEntryGenerator.java
+++
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/FollowerAppendLogEntryGenerator.java
@@ -336,7 +336,7 @@ private void configureGroup() throws IOException {
.build());
RaftClient client = RaftClient.newBuilder()
.setClientId(clientId)
- .setProperties(new RaftProperties(true))
+ .setProperties(new RaftProperties())
.setRaftGroup(group)
.build();
diff --git
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/LeaderAppendLogEntryGenerator.java
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/LeaderAppendLogEntryGenerator.java
index 971e6f7e4..8943ca1fa 100644
---
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/LeaderAppendLogEntryGenerator.java
+++
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/LeaderAppendLogEntryGenerator.java
@@ -265,7 +265,7 @@ private void configureGroup() throws IOException {
.build());
RaftClient client = RaftClient.newBuilder()
.setClientId(clientId)
- .setProperties(new RaftProperties(true))
+ .setProperties(new RaftProperties())
.setRaftGroup(group)
.build();
```
Ran Ozone test:
```
mvn -Dskip.installnpm -Dskip.installnpx -Dskip.installyarn -Dskip.npm
-Dskip.npx -Dskip.yarn -DskipShade \
-am -pl :ozone-integration-test -Dsurefire.fork.timeout=120
-DfailIfNoTests=false -Dtest=TestStorageContainerManagerHA#testAllSCMAreRunning
\
-Dratis.version="$ratis_version" \
-Dratis.thirdparty.version=1.0.0 -Dgrpc.protobuf-compile.version=3.19.2
-Dnetty.version=4.1.74.Final -Dio.grpc.version=1.44.0
-Dtcnative.version=2.0.48.Final \
clean test
```
```
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.43
s - in org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]