[ 
https://issues.apache.org/jira/browse/RATIS-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351112#comment-16351112
 ] 

Tsz Wo Nicholas Sze commented on RATIS-203:
-------------------------------------------

Thanks Lokesh.  You are right that we should fail the requests of truncated 
entries.  

Some comments on the patch:

- Let's add the failClientRequest below to RaftServerImpl.  Then, all changes 
to public can be avoided.
{code}
//RaftServerImpl
  public void failClientRequest(LogEntryProto logEntry) {
    if (logEntry.getLogEntryBodyCase() == 
LogEntryProto.LogEntryBodyCase.SMLOGENTRY) {
      final ClientId clientId = ClientId.valueOf(logEntry.getClientId());
      final RetryCache.CacheEntry cacheEntry = getRetryCache().get(clientId, 
logEntry.getCallId());
      if (cacheEntry != null) {
        final RaftClientReply reply = new RaftClientReply(clientId, getId(), 
getGroupId(),
            logEntry.getCallId(), false, null, generateNotLeaderException());
        cacheEntry.failWithReply(reply);
      }
    }
  }
}
{code}
- Once we have failClientRequest, the code remaining in SegmentedRaftLog 
becomes quite short.  Let's just add them to SegmentedRaftLog.append directly
{code}
// SegmentedRaftLog.append
          // fails the truncated requests
          for(; storedEntry != null; ) {
            try {
              final LogEntryProto entry = get(storedEntry.getIndex());
              server.failClientRequest(entry);
            } catch (RaftLogIOException e) {
              LOG.error("Failed to read log " + storedEntry, e);
            }

            if (iter.hasNext()) {
              storedEntry = iter.next();
            }
          }
{code}
- The test change does not seem testing the new feature (failing the truncated 
requerests).  Is it the case?


> RaftClient request may wait indefinitely for a reply
> ----------------------------------------------------
>
>                 Key: RATIS-203
>                 URL: https://issues.apache.org/jira/browse/RATIS-203
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Lokesh Jain
>            Assignee: Lokesh Jain
>            Priority: Major
>         Attachments: RATIS-203.001.patch, RATIS-203.002.patch
>
>
> There are two scenarios which can lead to such a situation.
>  # Raft leader accepts an entry and puts it in its retry cache. But it fails 
> or changes before it can append the entries to its follower. In such a case 
> the entry would remain in its retry cache. If this leader is chosen again by 
> the ring, any subsequent request by the raft client would wait indefinitely 
> for the future in retry cache entry to complete.
>  # The leader receives the request but dies before replying to it.
> Below are the log entries corresponding to first scenario.
> {code:java}
> 2018-01-25 16:28:55,479 DEBUG impl.RaftServerImpl 
> (LeaderState.java:addPendingRequest(239)) - s2: addPendingRequest at 
> index=3255, request=RaftClientRequest(client-C496AD50C41A->s2) in 
> group-8A72B1078A40, cid=3554, seq=293 RW, 322d323933
> 2018-01-25 16:28:57,457 DEBUG impl.RaftServerImpl 
> (RaftServerImpl.java:submitClientRequestAsync(476)) - s2: receive client 
> request(RaftClientRequest(client-C496AD50C41A->s2) in group-8A72B1078A40, 
> cid=3554, seq=293 RW, 322d323933) future client-C496AD50C41A:3554:pending 
> passedjava.util.concurrent.CompletableFuture@404d76c7[Not completed]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to