[ 
https://issues.apache.org/jira/browse/RATIS-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932236#comment-16932236
 ] 

Lokesh Jain commented on RATIS-677:
-----------------------------------

[~Sammi] Was there any change in the ratis versions when the ozone binaries 
were updated?

[~szetszwo] I agree that the log segment can not be corrected. My concern was 
around the usage of these log segments. We should make sure that the 
corresponding entries and log segment is considered as corrupted in later 
operations.

> Logentry marked corrupt due to ChecksumException
> ------------------------------------------------
>
>                 Key: RATIS-677
>                 URL: https://issues.apache.org/jira/browse/RATIS-677
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: Sammi Chen
>            Assignee: Tsz Wo Nicholas Sze
>            Priority: Blocker
>         Attachments: r677_20190913.patch
>
>
> Steps:
> 1.  Run Teragen and generated a few GB data in a 4 datanodes cluster.  
> 2.  Stoped the datanodes through ./stop-ozone.sh.
> 3.  Changed the ozone binaries
> 4.  Start the cluster through ./start-ozone.sh.
> 5.  Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side. 
>  
> Checked these two failed node, datanode process is still running. In the 
> logfile, I found a lot of following errors. 
> 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO       - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO       - 
> Attempting to start container services.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO       - 
> Background container scanner has been disabled.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO       - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR      - 
> Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 
> seconds.
> org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
> checksum is -134141393 but read checksum 0
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
>         at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
>         at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
>         at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
>         at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
>         at 
> org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:120)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:110)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to