[ 
https://issues.apache.org/jira/browse/RATIS-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764983#comment-17764983
 ] 

GuoHao edited comment on RATIS-1887 at 9/14/23 8:26 AM:
--------------------------------------------------------

Thank you for your attention to this issue. 

 

Sorry [~szetszwo]  my server adjusts the log level of the 
`org.apache.ratis.server` package to WARN and can't find the logs. It is more 
difficult to reproduce this problem again.

 

About this comment:
{quote}It not, the server seems to be killed right after 
[truncate|https://github.com/apache/ratis/blob/b8ce6d1f6ea37ed3ff9f6e888d2357fe48490567/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L664]
 but before 
[move|https://github.com/apache/ratis/blob/b8ce6d1f6ea37ed3ff9f6e888d2357fe48490567/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L670].
{quote}
I agree with you that. For the fix, check out the comments below!

 

Is it possible to solve this problem by deleting the `ToDelete` segemnt log 
first, then trancate & rename the `ToTruncate` segment log.

 

Like this:
{code:java}
// 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.TruncateLog#execute
  

@Override
void execute() throws IOException {
  freeSegmentedRaftLogOutputStream();

  if (segments.getToDelete() != null && segments.getToDelete().length > 0) {
    long minStart = segments.getToDelete()[0].getStartIndex();
    for (SegmentFileInfo del : segments.getToDelete()) {
      final File delFile = del.getFile(storage);
      Preconditions.assertTrue(delFile.exists(),
          "File %s to be deleted does not exist", delFile);
      FileUtils.deleteFile(delFile);
      LOG.info("{}: Deleted log file {}", name, delFile);
      minStart = Math.min(minStart, del.getStartIndex());
    }
    if (segments.getToTruncate() == null) {
      lastWrittenIndex = minStart - 1;
    }
  }
  if (segments.getToTruncate() != null) {
    final File fileToTruncate = segments.getToTruncate().getFile(storage);
    Preconditions.assertTrue(fileToTruncate.exists(),
        "File %s to be truncated does not exist", fileToTruncate);
    FileUtils.truncateFile(fileToTruncate, 
segments.getToTruncate().getTargetLength());

    // rename the file
    final File dstFile = segments.getToTruncate().getNewFile(storage);
    Preconditions.assertTrue(!dstFile.exists(),
        "Truncated file %s already exists ", dstFile);
    FileUtils.move(fileToTruncate, dstFile);
    LOG.info("{}: Truncated log file {} to length {} and moved it to {}", name,
        fileToTruncate, segments.getToTruncate().getTargetLength(), dstFile);

    // update lastWrittenIndex
    lastWrittenIndex = segments.getToTruncate().getNewEndIndex();
  }
 
  if (stateMachineFuture != null) {
    IOUtils.getFromFuture(stateMachineFuture, () -> this + 
"-truncateStateMachineData");
  }
  flushIndex.setUnconditionally(lastWrittenIndex, infoIndexChange);
  safeCacheEvictIndex.setUnconditionally(lastWrittenIndex, infoIndexChange);
  postUpdateFlushedIndex(0);
}{code}
 

[~szetszwo] [~Sammi] What do you think about this change?


was (Author: nicholas niu):
Thank you for your attention to this issue. 

 

Sorry [~szetszwo]  my server adjusts the log level of the 
`org.apache.ratis.server` package to WARN and can't find the logs. It is more 
difficult to reproduce this problem again.

 

About this comment:
{quote}It not, the server seems to be killed right after 
[truncate|https://github.com/apache/ratis/blob/b8ce6d1f6ea37ed3ff9f6e888d2357fe48490567/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L664]
 but before 
[move|https://github.com/apache/ratis/blob/b8ce6d1f6ea37ed3ff9f6e888d2357fe48490567/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L670].
{quote}
I agree with you that. For the fix, check out the comments below!

 

> Gap between segement log
> ------------------------
>
>                 Key: RATIS-1887
>                 URL: https://issues.apache.org/jira/browse/RATIS-1887
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: GuoHao
>            Priority: Critical
>         Attachments: image-2023-09-08-10-18-36-198.png
>
>
>  
> My version of ratis already includes this 
> pr(https://issues.apache.org/jira/browse/RATIS-1763) and I am using a new 
> raft server.
>  
> Description:
> 1. i am using ratis version 2.5.1
> 2. the application software is ozone 1.3.0 scm ha
>  
> scm error log:
> {code:java}
> Caused by: java.lang.IllegalStateException: Found a gap between logs: the 
> last log segment log-784848_809981 ended at 809981 but the next log segment 
> log-822560_856038 started at 822560
>         at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:72)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.validateAdding(SegmentedRaftLogCache.java:424)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.addSegment(SegmentedRaftLogCache.java:431)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:384)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:241)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:214)
>         at 
> org.apache.ratis.server.raftlog.RaftLogBase.open(RaftLogBase.java:251)
>         at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:239)
>         at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:220)
>         at 
> org.apache.ratis.server.impl.ServerState.lambda$new$5(ServerState.java:161)
>         at 
> org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>         at 
> org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:177)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:338)
>         at 
> org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:188){code}
> segment log:
>  
> !image-2023-09-08-10-18-36-198.png!
>  
> The modification time of this segment log is greater than the modification 
> time of the file with the larger index.
> The file size of this file seems to be larger than the other files, but it's 
> not as big as the other files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to