[
https://issues.apache.org/jira/browse/HDDS-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807494#comment-17807494
]
Hemant Kumar commented on HDDS-10059:
-------------------------------------
[~xbis] my worry will need approach is if we are missing any file which is not
downloaded from leader to follower. Ideally after the bootstrapping follower
should have the same files as the leader in both activeFS and snapshot unless
some compaction happens on leader's *activeFS* but it only matters for
{*}activeFS{*}. Even if compaction happens, *leader.snapshot* should match with
*follow.snapshot* but in your example there is a difference in
*leader.snapshot* and {*}follower.snapshot{*}.
1. In previous sync, you mentioned that
*leader_node/db.checkpoints/om.db_checkpoint_* didn't have the files in first
place. Is that correct?
2. For the test, to eliminate the compaction of activeFS, we can pause
compaction (if possible) after the checkpoint creation on leader and verify if
issue persist.
I think [~georgeJahad] would be better person to comment on new approach since
he mostly worked on bootstrapping code.
> [disabled] Intermittent failure in TestOMRatisSnapshots.testInstallSnapshot
> ---------------------------------------------------------------------------
>
> Key: HDDS-10059
> URL: https://issues.apache.org/jira/browse/HDDS-10059
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: test
> Reporter: Attila Doroszlai
> Assignee: Christos Bisias
> Priority: Major
>
> Failure 1:
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2023/12/30/27977/it-om/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOMRatisSnapshots.txt}
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(int,
> Path)[1] -- Time elapsed: 90.79 s <<< ERROR!
> java.io.IOException: snapshot directory doesn't exist
> at
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.createOzoneSnapshot(TestOMRatisSnapshots.java:1060)
> at
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(TestOMRatisSnapshots.java:238)
> {code}
> Failure 2:
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2024/01/03/28076/it-om/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOMRatisSnapshots.txt}
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(int,
> Path)[1] -- Time elapsed: 85.34 s <<< ERROR!
> java.nio.file.NoSuchFileException:
> /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-3a4cbda0-c8c0-415c-b2a8-04058ca404e1/omNode-3/om.db/000786.sst
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotUtils.getINode(OmSnapshotUtils.java:67)
> at
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.checkSnapshot(TestOMRatisSnapshots.java:373)
> at
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(TestOMRatisSnapshots.java:312)
> {code}
> [~xBis] would you like to take a look?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]