[ 
https://issues.apache.org/jira/browse/HDDS-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773027#comment-16773027
 ] 

Shashikant Banerjee commented on HDDS-1124:
-------------------------------------------

172.27.76.72 being a leader sends append request to follower node 172.27.54.212 
. The follower's next index should be 3048

. Meanwhile leader election gets triggered and hence 172.27.54.212 also write a 
config entry top its log at index 3048 as well as 172.27.76.72 writes a conf 
entry at index 3048.

 
{code:java}
2019-02-15 10:15:53,319 INFO org.apache.ratis.server.impl.RoleInfo: 
a40a7b01-a30b-469c-b373-9fcb20a126ed: start FollowerState

2019-02-15 10:15:53,319 INFO org.apache.ratis.server.impl.FollowerState: 
a40a7b01-a30b-469c-b373-9fcb20a126ed: FollowerState was interrupted: 
java.lang.InterruptedException: sleep interrupted

2019-02-15 10:15:53,463 INFO org.apache.ratis.server.impl.RaftServerImpl: 
a40a7b01-a30b-469c-b373-9fcb20a126ed: change Leader from null to 
943007c8-4fdd-4926-89e2-2c8c52c05073 at term 3 for appendEntries, leader 
elected after 1364ms

2019-02-15 10:15:53,591 INFO org.apache.ratis.server.impl.RaftServerImpl: 
a40a7b01-a30b-469c-b373-9fcb20a126ed: set configuration 3048: 
[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048

2019-02-15 10:15:53,594 INFO org.apache.ratis.server.storage.RaftLogWorker: 
a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Truncating segments 
toTruncate: (3036, 3048) isOpen? true, length=11529, newEndIndex=3047

  toDelete: [], start index 3048

2019-02-15 10:15:53,594 INFO org.apache.ratis.server.storage.RaftLogWorker: 
a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Starting segment from 
index:3048

2019-02-15 10:15:53,597 INFO org.apache.ratis.server.storage.RaftLogWorker: 
a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Truncated log file 
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036
 to length 11529 and moved it to 
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047{code}
When the appendEntries reply gets received on 172.27.76.72, since the follower 
node has incremented the next index to 3049 (as a conf entry is written at 
index 3048), but requestor assumes it to be 3048, the assertion is hit.
{code:java}
final long replyNextIndex = reply.getNextIndex();
final long lastIndex = replyNextIndex - 1;
final boolean updateMatchIndex;

if (request.getEntriesCount() == 0) {
  Preconditions.assertTrue(!request.hasPreviousLog() ||
          lastIndex == request.getPreviousLog().getIndex(),
      "reply's next index is %s, request's previous is %s",
      replyNextIndex, request.getPreviousLog());
  updateMatchIndex = request.hasPreviousLog() && follower.getMatchIndex() < 
lastIndex;
} e
{code}
This is quite possible in the system and not a fatal bug.

 

 

> java.lang.IllegalStateException exception in datanode log
> ---------------------------------------------------------
>
>                 Key: HDDS-1124
>                 URL: https://issues.apache.org/jira/browse/HDDS-1124
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Nilotpal Nandi
>            Assignee: Shashikant Banerjee
>            Priority: Major
>
> steps taken :
> --------------------
>  # created 12 datanodes cluster and running workload on all the nodes
> exception seen :
> ---------------------------
>  
> {noformat}
> 2019-02-15 10:15:53,355 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036
>  to 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047
> 2019-02-15 10:15:53,367 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073: set configuration 3048: 
> [a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048
> 2019-02-15 10:15:53,523 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
> 2019-02-15 10:15:53,580 ERROR org.apache.ratis.grpc.server.GrpcLogAppender: 
> Failed onNext serverReply {
>  requestorId: "943007c8-4fdd-4926-89e2-2c8c52c05073"
>  replyId: "a40a7b01-a30b-469c-b373-9fcb20a126ed"
>  raftGroupId {
>  id: "\001\323\357*\221,O\300\200\266\001#C\327j\333"
>  }
>  success: true
> }
> term: 3
> nextIndex: 3049
> followerCommit: 3047
> java.lang.IllegalStateException: reply's next index is 3049, request's 
> previous is term: 1
> index: 3047
> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.onSuccess(GrpcLogAppender.java:285)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNextImpl(GrpcLogAppender.java:230)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:215)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:197)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421)
>  at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
>  at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment 
> log-3048_3066 to index:3066
> 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
>  to 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3048-3066
> 2019-02-15 10:15:56,564 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3067
> 2019-02-15 10:16:45,420 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment 
> log-3067_3077 to index:3077
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to