[ 
https://issues.apache.org/jira/browse/HDDS-11352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875896#comment-17875896
 ] 

Ethan Rose commented on HDDS-11352:
-----------------------------------

Thanks for the quick fix. We can try repeated runs of this test when Ratis 
3.1.1 or 3.2.0 is brought into Ozone.

> Intermittent Raft Log Corruption in TestOzoneManagerHAWithStoppedNodes
> ----------------------------------------------------------------------
>
>                 Key: HDDS-11352
>                 URL: https://issues.apache.org/jira/browse/HDDS-11352
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: Ozone Manager
>            Reporter: Ethan Rose
>            Priority: Critical
>         Attachments: it-om.zip
>
>
> Failure observed in [this 
> run|https://github.com/apache/ozone/actions/runs/10484629833/job/29039668567] 
> in {{TestOzoneManagerHAWithStoppedNodes#testListVolumes}}, but may not be 
> specific to that test in particular.
> {code}
> -------------------------------------------------------------------------------
> Test set: org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes
> -------------------------------------------------------------------------------
> Tests run: 12, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 621.712 s 
> <<< FAILURE! - in 
> org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes
> org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes.twoOMDown  Time 
> elapsed: 18.461 s  <<< ERROR!
> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
> omNode-1@group-523986131536: Failed to initRaftLog.
>       at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
>       at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
>       at 
> java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1498)
>       at 
> java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1219)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>       at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>       at 
> org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:206)
>       at 
> org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:182)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>       at java.base/java.lang.Thread.run(Thread.java:840)
> Caused by: java.lang.IllegalStateException: omNode-1@group-523986131536: 
> Failed to initRaftLog.
>       at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:171)
>       at 
> org.apache.ratis.server.impl.ServerState.lambda$new$6(ServerState.java:131)
>       at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:63)
>       at 
> org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:148)
>       at 
> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:385)
>       at 
> org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:203)
>       ... 4 more
> Caused by: org.apache.ratis.protocol.exceptions.ChecksumException: Log entry 
> corrupted: Calculated checksum is 3AB532B2 but read checksum is 31120F6C.
>       at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:319)
>       at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:204)
>       at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:131)
>       at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:138)
>       at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:172)
>       at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:428)
>       at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:258)
>       at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:231)
>       at 
> org.apache.ratis.server.raftlog.RaftLogBase.open(RaftLogBase.java:273)
>       at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:194)
>       at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:168)
>       ... 9 more
> org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes.testListVolumes 
>  Time elapsed: 121.075 s  <<< ERROR!
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to