[ 
https://issues.apache.org/jira/browse/HDDS-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807494#comment-17807494
 ] 

Hemant Kumar commented on HDDS-10059:
-------------------------------------

[~xbis] my worry will need approach is if we are missing any file which is not 
downloaded from leader to follower. Ideally after the bootstrapping follower 
should have the same files as the leader in both activeFS and snapshot unless 
some compaction happens on leader's *activeFS* but it only matters for 
{*}activeFS{*}. Even if compaction happens, *leader.snapshot* should match with 
*follow.snapshot* but in your example there is a difference in 
*leader.snapshot* and {*}follower.snapshot{*}.

1. In previous sync, you mentioned that 
*leader_node/db.checkpoints/om.db_checkpoint_* didn't have the files in first 
place. Is that correct?
2. For the test, to eliminate the compaction of activeFS, we can pause 
compaction (if possible) after the checkpoint creation on leader and verify if 
issue persist.

I think [~georgeJahad] would be better person to comment on new approach since 
he mostly worked on bootstrapping code.

> [disabled] Intermittent failure in TestOMRatisSnapshots.testInstallSnapshot
> ---------------------------------------------------------------------------
>
>                 Key: HDDS-10059
>                 URL: https://issues.apache.org/jira/browse/HDDS-10059
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Attila Doroszlai
>            Assignee: Christos Bisias
>            Priority: Major
>
> Failure 1:
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2023/12/30/27977/it-om/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOMRatisSnapshots.txt}
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(int, 
> Path)[1] -- Time elapsed: 90.79 s <<< ERROR!
> java.io.IOException: snapshot directory doesn't exist
>       at 
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.createOzoneSnapshot(TestOMRatisSnapshots.java:1060)
>       at 
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(TestOMRatisSnapshots.java:238)
> {code}
> Failure 2:
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2024/01/03/28076/it-om/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOMRatisSnapshots.txt}
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(int, 
> Path)[1] -- Time elapsed: 85.34 s <<< ERROR!
> java.nio.file.NoSuchFileException: 
> /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-3a4cbda0-c8c0-415c-b2a8-04058ca404e1/omNode-3/om.db/000786.sst
>       at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>       at 
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>       at 
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
>       at 
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
>       at java.nio.file.Files.readAttributes(Files.java:1737)
>       at 
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotUtils.getINode(OmSnapshotUtils.java:67)
>       at 
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.checkSnapshot(TestOMRatisSnapshots.java:373)
>       at 
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(TestOMRatisSnapshots.java:312)
> {code}
> [~xBis] would you like to take a look?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to