[ https://issues.apache.org/jira/browse/HDDS-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773027#comment-16773027 ]
Shashikant Banerjee commented on HDDS-1124: ------------------------------------------- 172.27.76.72 being a leader sends append request to follower node 172.27.54.212 . The follower's next index should be 3048 . Meanwhile leader election gets triggered and hence 172.27.54.212 also write a config entry top its log at index 3048 as well as 172.27.76.72 writes a conf entry at index 3048. {code:java} 2019-02-15 10:15:53,319 INFO org.apache.ratis.server.impl.RoleInfo: a40a7b01-a30b-469c-b373-9fcb20a126ed: start FollowerState 2019-02-15 10:15:53,319 INFO org.apache.ratis.server.impl.FollowerState: a40a7b01-a30b-469c-b373-9fcb20a126ed: FollowerState was interrupted: java.lang.InterruptedException: sleep interrupted 2019-02-15 10:15:53,463 INFO org.apache.ratis.server.impl.RaftServerImpl: a40a7b01-a30b-469c-b373-9fcb20a126ed: change Leader from null to 943007c8-4fdd-4926-89e2-2c8c52c05073 at term 3 for appendEntries, leader elected after 1364ms 2019-02-15 10:15:53,591 INFO org.apache.ratis.server.impl.RaftServerImpl: a40a7b01-a30b-469c-b373-9fcb20a126ed: set configuration 3048: [a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048 2019-02-15 10:15:53,594 INFO org.apache.ratis.server.storage.RaftLogWorker: a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Truncating segments toTruncate: (3036, 3048) isOpen? true, length=11529, newEndIndex=3047 toDelete: [], start index 3048 2019-02-15 10:15:53,594 INFO org.apache.ratis.server.storage.RaftLogWorker: a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Starting segment from index:3048 2019-02-15 10:15:53,597 INFO org.apache.ratis.server.storage.RaftLogWorker: a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Truncated log file /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036 to length 11529 and moved it to /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047{code} When the appendEntries reply gets received on 172.27.76.72, since the follower node has incremented the next index to 3049 (as a conf entry is written at index 3048), but requestor assumes it to be 3048, the assertion is hit. {code:java} final long replyNextIndex = reply.getNextIndex(); final long lastIndex = replyNextIndex - 1; final boolean updateMatchIndex; if (request.getEntriesCount() == 0) { Preconditions.assertTrue(!request.hasPreviousLog() || lastIndex == request.getPreviousLog().getIndex(), "reply's next index is %s, request's previous is %s", replyNextIndex, request.getPreviousLog()); updateMatchIndex = request.hasPreviousLog() && follower.getMatchIndex() < lastIndex; } e {code} This is quite possible in the system and not a fatal bug. > java.lang.IllegalStateException exception in datanode log > --------------------------------------------------------- > > Key: HDDS-1124 > URL: https://issues.apache.org/jira/browse/HDDS-1124 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Nilotpal Nandi > Assignee: Shashikant Banerjee > Priority: Major > > steps taken : > -------------------- > # created 12 datanodes cluster and running workload on all the nodes > exception seen : > --------------------------- > > {noformat} > 2019-02-15 10:15:53,355 INFO org.apache.ratis.server.storage.RaftLogWorker: > 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from > /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036 > to > /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047 > 2019-02-15 10:15:53,367 INFO org.apache.ratis.server.impl.RaftServerImpl: > 943007c8-4fdd-4926-89e2-2c8c52c05073: set configuration 3048: > [a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, > 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, > 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048 > 2019-02-15 10:15:53,523 INFO org.apache.ratis.server.storage.RaftLogWorker: > 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment > /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048 > 2019-02-15 10:15:53,580 ERROR org.apache.ratis.grpc.server.GrpcLogAppender: > Failed onNext serverReply { > requestorId: "943007c8-4fdd-4926-89e2-2c8c52c05073" > replyId: "a40a7b01-a30b-469c-b373-9fcb20a126ed" > raftGroupId { > id: "\001\323\357*\221,O\300\200\266\001#C\327j\333" > } > success: true > } > term: 3 > nextIndex: 3049 > followerCommit: 3047 > java.lang.IllegalStateException: reply's next index is 3049, request's > previous is term: 1 > index: 3047 > at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60) > at > org.apache.ratis.grpc.server.GrpcLogAppender.onSuccess(GrpcLogAppender.java:285) > at > org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNextImpl(GrpcLogAppender.java:230) > at > org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:215) > at > org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:197) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) > at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: > 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment > log-3048_3066 to index:3066 > 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: > 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from > /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048 > to > /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3048-3066 > 2019-02-15 10:15:56,564 INFO org.apache.ratis.server.storage.RaftLogWorker: > 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment > /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3067 > 2019-02-15 10:16:45,420 INFO org.apache.ratis.server.storage.RaftLogWorker: > 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment > log-3067_3077 to index:3077 > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org