sashapolo commented on code in PR #4624:
URL: https://github.com/apache/ignite-3/pull/4624#discussion_r1816472755


##########
modules/cluster-management/src/main/java/org/apache/ignite/internal/cluster/management/MetaStorageInfo.java:
##########
@@ -53,4 +43,11 @@ public interface MetaStorageInfo extends NetworkMessage, 
Serializable {
      * a trace here.
      */
     @Nullable Long metastorageRepairingConfigIndex();
+
+    /**
+     * Returns whether MG was being repaired in this cluster incarnation.

Review Comment:
   ```suggestion
        * Returns whether MG was repaired in this cluster incarnation.
   ```



##########
modules/cluster-management/src/main/java/org/apache/ignite/internal/cluster/management/raft/commands/ChangeMetaStorageInfoCommand.java:
##########
@@ -34,12 +33,6 @@ public interface ChangeMetaStorageInfoCommand extends 
WriteCommand {
      */
     Set<String> metaStorageNodes();
 
-    /**
-     * Raft index in the Metastorage group under which the forced 
configuration is (or will be) saved, or {@code null} if no MG

Review Comment:
   Javadocs for this and the next methods are mixed up



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/impl/MetaStorageManagerImpl.java:
##########
@@ -378,15 +381,85 @@ private void finishRecovery(long targetRevision) {
         }
     }
 
-    private CompletableFuture<MetaStorageServiceImpl> 
initializeMetaStorage(MetaStorageInfo metaStorageInfo) {
+    private CompletableFuture<MetaStorageServiceImpl> 
reenterIfNeededAndInitializeMetaStorage(
+            MetaStorageInfo metaStorageInfo,
+            UUID currentClusterId
+    ) {
+        CompletableFuture<Void> reentryFuture = nullCompletedFuture();
+
+        if (thisNodeDidNotWitnessMetaStorageRepair(metaStorageInfo, 
currentClusterId)) {
+            reentryFuture = 
tryReenteringMetastorage(metaStorageInfo.metaStorageNodes(), currentClusterId);
+        }
+
+        return reentryFuture.thenCompose(unused -> 
initializeMetastorage(metaStorageInfo));
+    }
+
+    private boolean thisNodeDidNotWitnessMetaStorageRepair(MetaStorageInfo 
metaStorageInfo, UUID currentClusterId) {
+        UUID locallyWitnessedRepairClusterId = 
metastorageRepairStorage.readWitnessedMetastorageRepairClusterId();
+
+        return metaStorageInfo.metastorageRepairedInThisClusterIncarnation()
+                && !Objects.equals(locallyWitnessedRepairClusterId, 
currentClusterId);
+    }
+
+    private CompletableFuture<Void> tryReenteringMetastorage(Set<String> 
metastorageNodes, UUID currentClusterId) {
+        LOG.info("Trying to reenter Metastorage group");
+
+        return validateMetastorageForDivergence(metastorageNodes)
+                .thenRun(() -> {

Review Comment:
   ```suggestion
                   .thenRun(() -> prepareMetaStorageReentry(currentClusterId));
   ```



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/server/persistence/RocksDbKeyValueStorage.java:
##########
@@ -1317,6 +1328,33 @@ public long checksum(long revision) {
         }
     }
 
+    @Override
+    public ChecksumAndRevisions checksumAndRevisions(long revision) {
+        rwLock.readLock().lock();
+
+        try {
+            return new ChecksumAndRevisions(
+                    checksumByRevisionOrZero(revision),
+                    minChecksummedRevisionOrZero(),
+                    rev
+            );
+        } catch (RocksDBException e) {
+            throw new MetaStorageException(INTERNAL_ERR, "Cannot get checksum 
by revision: " + revision, e);
+        } finally {
+            rwLock.readLock().unlock();
+        }
+    }
+
+    private long minChecksummedRevisionOrZero() {
+        try (
+                var options = new ReadOptions();

Review Comment:
   shall we add a `setTailing(true)` option here? We don't need to create a 
database snapshot.



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/command/GetChecksumCommand.java:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.metastorage.command;
+
+import org.apache.ignite.internal.network.annotations.Transferable;
+import org.apache.ignite.internal.raft.ReadCommand;
+
+/**
+ * Get command for MetaStorageCommandListener that retrieves information about 
a checksum.

Review Comment:
   It would be helpful to add a link to `ChecksumInfo` here. "information about 
a checksum" is a bit too vague on its own



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/command/response/ChecksumInfo.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.metastorage.command.response;
+
+import java.io.Serializable;
+import org.apache.ignite.internal.tostring.S;
+
+/**
+ * Information about checksum for a revision of the Metastorage.
+ */
+public class ChecksumInfo implements Serializable {

Review Comment:
   Why is this class `Serializable`?



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/impl/MetaStorageManagerImpl.java:
##########
@@ -378,15 +381,85 @@ private void finishRecovery(long targetRevision) {
         }
     }
 
-    private CompletableFuture<MetaStorageServiceImpl> 
initializeMetaStorage(MetaStorageInfo metaStorageInfo) {
+    private CompletableFuture<MetaStorageServiceImpl> 
reenterIfNeededAndInitializeMetaStorage(
+            MetaStorageInfo metaStorageInfo,
+            UUID currentClusterId
+    ) {
+        CompletableFuture<Void> reentryFuture = nullCompletedFuture();

Review Comment:
   I think this is excessive, you just call `initializeMetaStorage` directly



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/command/response/ChecksumInfo.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.metastorage.command.response;
+
+import java.io.Serializable;
+import org.apache.ignite.internal.tostring.S;
+
+/**
+ * Information about checksum for a revision of the Metastorage.
+ */
+public class ChecksumInfo implements Serializable {

Review Comment:
   I think this class and corresponding commands should be called 
`RevisionInfo`, because this is information about a requested revision, not a 
checksum



##########
modules/metastorage/src/integrationTest/java/org/apache/ignite/internal/metastorage/impl/ItMetaStorageMultipleNodesAbstractTest.java:
##########
@@ -107,6 +107,12 @@ KeyValueStorage createStorage(String nodeName, Path path, 
ReadOperationForCompac
         return new SimpleInMemoryKeyValueStorage(nodeName, 
readOperationForCompactionTracker);
     }
 
+    void startMetastorageOn(List<Node> nodes) {
+        for (Node node : nodes) {
+            assertThat(node.metaStorageManager.startAsync(new 
ComponentContext()), willCompleteSuccessfully());

Review Comment:
   I think we can cache the `ComponentContext` at least outside the cycle or as 
a field



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/server/persistence/RocksDbKeyValueStorage.java:
##########
@@ -1317,6 +1328,33 @@ public long checksum(long revision) {
         }
     }
 
+    @Override
+    public ChecksumAndRevisions checksumAndRevisions(long revision) {
+        rwLock.readLock().lock();
+
+        try {
+            return new ChecksumAndRevisions(
+                    checksumByRevisionOrZero(revision),
+                    minChecksummedRevisionOrZero(),
+                    rev
+            );
+        } catch (RocksDBException e) {
+            throw new MetaStorageException(INTERNAL_ERR, "Cannot get checksum 
by revision: " + revision, e);
+        } finally {
+            rwLock.readLock().unlock();
+        }
+    }
+
+    private long minChecksummedRevisionOrZero() {
+        try (
+                var options = new ReadOptions();
+                RocksIterator it = revisionToChecksum.newIterator(options)
+        ) {
+            it.seekToFirst();
+            return it.isValid() ? bytesToLong(it.key()) : 0;

Review Comment:
   you also need to call `it.status()`



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/impl/MetastorageDivergedException.java:
##########
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.metastorage.impl;
+
+import org.apache.ignite.internal.lang.IgniteInternalException;
+import org.apache.ignite.lang.ErrorGroups;
+
+/**
+ * Thrown when Metastorage on a node doing Metastorage reentry has been found 
to be diverged with the leader.
+ */
+public class MetastorageDivergedException extends IgniteInternalException {

Review Comment:
   Is this exception expected to be propagated to the outside world?



##########
modules/system-disaster-recovery/src/integrationTest/java/org/apache/ignite/internal/disaster/system/ItMetastorageGroupDisasterRecoveryTest.java:
##########
@@ -252,4 +263,107 @@ private static String leaderName(RaftGroupService 
mgClient0) {
 
         return leader.consistentId();
     }
+
+    @Test
+    void 
migratesNodesThatSawNoReparationToNewClusterIfMetastorageDidNotDiverge() throws 
Exception {
+        startAndInitCluster(2, new int[]{0}, new int[]{1});
+        waitTillClusterStateIsSavedToVaultOnConductor(0);
+
+        ComponentWorkingDir msWorkDir0 = igniteImpl(0).metastorageWorkDir();
+        ComponentWorkingDir msWorkDir1 = igniteImpl(1).metastorageWorkDir();
+
+        IntStream.of(0, 1).parallel().forEach(cluster::stopNode);
+
+        // Copy Metastorage state from old leader (1) to future leader (0) to 
make sure that 1 is not ahead of 0 and there will be
+        // no Metastorage divergence when we make 0 new leader and migrate 1 
to cluster again.
+        copyMetastorageState(msWorkDir1, msWorkDir0);
+
+        // Repair MG with just node 0 in CMG.
+        cluster.startEmbeddedNode(0);
+        initiateMgRepairVia(((IgniteServerImpl) 
cluster.server(0)).igniteImpl(), 1, 0);
+        IgniteImpl restartedIgniteImpl0 = waitTillNodeRestartsInternally(0);
+        waitTillMgHasMajority(restartedIgniteImpl0);
+
+        // Starting the node that did not see the repair.
+        migrate(1, 0);
+
+        LogicalTopologySnapshot topologySnapshot = 
restartedIgniteImpl0.logicalTopologyService().logicalTopologyOnLeader().get(10, 
SECONDS);
+        assertTopologyContainsNode(1, topologySnapshot);
+    }
+
+    private static void copyMetastorageState(ComponentWorkingDir source, 
ComponentWorkingDir dest) throws IOException {
+        replaceDir(source.dbPath(), dest.dbPath());
+        replaceDir(source.raftLogPath(), dest.raftLogPath());
+
+        String pathToSnapshots = "metastorage_group_0/snapshot";
+        replaceDir(source.metaPath().resolve(pathToSnapshots), 
dest.metaPath().resolve(pathToSnapshots));
+    }
+
+    private static void replaceDir(Path sourceDir, Path destDir) throws 
IOException {
+        assertTrue(Files.isDirectory(sourceDir));
+        assertTrue(Files.isDirectory(destDir));
+
+        assertTrue(IgniteUtils.deleteIfExists(destDir));
+        Files.createDirectory(destDir);
+        copyDir(sourceDir, destDir);
+    }
+
+    private static void copyDir(Path src, Path dest) throws IOException {
+        try (Stream<Path> stream = Files.walk(src)) {
+            stream.forEach(source -> copyFile(source, 
dest.resolve(src.relativize(source))));
+        }
+    }
+
+    private static void copyFile(Path source, Path dest) {
+        try {
+            Files.copy(source, dest, REPLACE_EXISTING);
+        } catch (Exception e) {
+            throw new RuntimeException(e.getMessage(), e);
+        }
+    }
+
+    @Test
+    void detectsMetastorageDivergence() throws Exception {
+        startAndInitCluster(2, new int[]{0}, new int[]{1});
+        waitTillClusterStateIsSavedToVaultOnConductor(0);
+
+        // Stopping node 0 to make sure it does not see the subsequent 
Metastorage write accepted by the leader (1).
+        // Later, we'll stop node 1, repair MG on 0, try to migrate 1 to the 
cluster, and, as 1 has a Metastorage put which 0 does not
+        // have, Metastorage divergence will have to be detected.
+        cluster.stopNode(0);
+
+        igniteImpl(1).metaStorageManager().put(new ByteArray("test-key"), new 
byte[]{42});
+
+        // This makes the MG majority go away.
+        cluster.stopNode(1);
+
+        cluster.startEmbeddedNode(0);
+        initiateMgRepairVia(((IgniteServerImpl) 
cluster.server(0)).igniteImpl(), 1, 0);
+        IgniteImpl restartedIgniteImpl0 = waitTillNodeRestartsInternally(0);
+        waitTillMgHasMajority(restartedIgniteImpl0);
+
+        // Starting the node that did not see the repair.
+        initiateMigration(1, 0);
+
+        assertThat(waitForRestartOrShutdownFuture(1), 
willCompleteSuccessfully());
+
+        // Attempt to migrate should fail.
+        assertThrowsWithCause(MetastorageDivergedException.class, "Metastorage 
has diverged", () -> cluster.server(1).api());
+
+        // Subsequent restart should also fail.
+        assertThrowsWithCause(MetastorageDivergedException.class, "Metastorage 
has diverged", () -> cluster.restartNode(1));
+    }
+
+    private static void assertThrowsWithCause(

Review Comment:
   What's wrong with `IgniteTestUtils#assertThrowsWithCause`?



##########
modules/metastorage/src/main/java/org/apache/ignite/internal/metastorage/impl/MetaStorageManagerImpl.java:
##########
@@ -378,15 +381,85 @@ private void finishRecovery(long targetRevision) {
         }
     }
 
-    private CompletableFuture<MetaStorageServiceImpl> 
initializeMetaStorage(MetaStorageInfo metaStorageInfo) {
+    private CompletableFuture<MetaStorageServiceImpl> 
reenterIfNeededAndInitializeMetaStorage(
+            MetaStorageInfo metaStorageInfo,
+            UUID currentClusterId
+    ) {
+        CompletableFuture<Void> reentryFuture = nullCompletedFuture();
+
+        if (thisNodeDidNotWitnessMetaStorageRepair(metaStorageInfo, 
currentClusterId)) {
+            reentryFuture = 
tryReenteringMetastorage(metaStorageInfo.metaStorageNodes(), currentClusterId);
+        }
+
+        return reentryFuture.thenCompose(unused -> 
initializeMetastorage(metaStorageInfo));
+    }
+
+    private boolean thisNodeDidNotWitnessMetaStorageRepair(MetaStorageInfo 
metaStorageInfo, UUID currentClusterId) {

Review Comment:
   Please add a javadoc explaining what "witnessing a cluster ID" means



##########
modules/raft/src/main/java/org/apache/ignite/internal/raft/server/impl/JraftServerImpl.java:
##########
@@ -571,8 +570,8 @@ public boolean stopRaftNodes(ReplicationGroupId groupId) {
     public void destroyRaftNodeStorages(RaftNodeId nodeId, RaftGroupOptions 
groupOptions) {
         // TODO: IGNITE-23079 - improve on what we do if it was not possible 
to destroy any of the storages.
         try {
-            String uri = nodeIdStr(nodeId);
-            groupOptions.getLogStorageFactory().destroyLogStorage(uri);
+            String logUri = nodeIdStr(nodeId);

Review Comment:
   What's this change for?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to