[
https://issues.apache.org/jira/browse/HDDS-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773027#comment-16773027
]
Shashikant Banerjee commented on HDDS-1124:
-------------------------------------------
172.27.76.72 being a leader sends append request to follower node 172.27.54.212
. The follower's next index should be 3048
. Meanwhile leader election gets triggered and hence 172.27.54.212 also write a
config entry top its log at index 3048 as well as 172.27.76.72 writes a conf
entry at index 3048.
{code:java}
2019-02-15 10:15:53,319 INFO org.apache.ratis.server.impl.RoleInfo:
a40a7b01-a30b-469c-b373-9fcb20a126ed: start FollowerState
2019-02-15 10:15:53,319 INFO org.apache.ratis.server.impl.FollowerState:
a40a7b01-a30b-469c-b373-9fcb20a126ed: FollowerState was interrupted:
java.lang.InterruptedException: sleep interrupted
2019-02-15 10:15:53,463 INFO org.apache.ratis.server.impl.RaftServerImpl:
a40a7b01-a30b-469c-b373-9fcb20a126ed: change Leader from null to
943007c8-4fdd-4926-89e2-2c8c52c05073 at term 3 for appendEntries, leader
elected after 1364ms
2019-02-15 10:15:53,591 INFO org.apache.ratis.server.impl.RaftServerImpl:
a40a7b01-a30b-469c-b373-9fcb20a126ed: set configuration 3048:
[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858,
8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858,
943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048
2019-02-15 10:15:53,594 INFO org.apache.ratis.server.storage.RaftLogWorker:
a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Truncating segments
toTruncate: (3036, 3048) isOpen? true, length=11529, newEndIndex=3047
toDelete: [], start index 3048
2019-02-15 10:15:53,594 INFO org.apache.ratis.server.storage.RaftLogWorker:
a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Starting segment from
index:3048
2019-02-15 10:15:53,597 INFO org.apache.ratis.server.storage.RaftLogWorker:
a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker: Truncated log file
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036
to length 11529 and moved it to
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047{code}
When the appendEntries reply gets received on 172.27.76.72, since the follower
node has incremented the next index to 3049 (as a conf entry is written at
index 3048), but requestor assumes it to be 3048, the assertion is hit.
{code:java}
final long replyNextIndex = reply.getNextIndex();
final long lastIndex = replyNextIndex - 1;
final boolean updateMatchIndex;
if (request.getEntriesCount() == 0) {
Preconditions.assertTrue(!request.hasPreviousLog() ||
lastIndex == request.getPreviousLog().getIndex(),
"reply's next index is %s, request's previous is %s",
replyNextIndex, request.getPreviousLog());
updateMatchIndex = request.hasPreviousLog() && follower.getMatchIndex() <
lastIndex;
} e
{code}
This is quite possible in the system and not a fatal bug.
> java.lang.IllegalStateException exception in datanode log
> ---------------------------------------------------------
>
> Key: HDDS-1124
> URL: https://issues.apache.org/jira/browse/HDDS-1124
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Nilotpal Nandi
> Assignee: Shashikant Banerjee
> Priority: Major
>
> steps taken :
> --------------------
> # created 12 datanodes cluster and running workload on all the nodes
> exception seen :
> ---------------------------
>
> {noformat}
> 2019-02-15 10:15:53,355 INFO org.apache.ratis.server.storage.RaftLogWorker:
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036
> to
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047
> 2019-02-15 10:15:53,367 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 943007c8-4fdd-4926-89e2-2c8c52c05073: set configuration 3048:
> [a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858,
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858,
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048
> 2019-02-15 10:15:53,523 INFO org.apache.ratis.server.storage.RaftLogWorker:
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
> 2019-02-15 10:15:53,580 ERROR org.apache.ratis.grpc.server.GrpcLogAppender:
> Failed onNext serverReply {
> requestorId: "943007c8-4fdd-4926-89e2-2c8c52c05073"
> replyId: "a40a7b01-a30b-469c-b373-9fcb20a126ed"
> raftGroupId {
> id: "\001\323\357*\221,O\300\200\266\001#C\327j\333"
> }
> success: true
> }
> term: 3
> nextIndex: 3049
> followerCommit: 3047
> java.lang.IllegalStateException: reply's next index is 3049, request's
> previous is term: 1
> index: 3047
> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.onSuccess(GrpcLogAppender.java:285)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNextImpl(GrpcLogAppender.java:230)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:215)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:197)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421)
> at
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
> at
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker:
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment
> log-3048_3066 to index:3066
> 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker:
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
> to
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3048-3066
> 2019-02-15 10:15:56,564 INFO org.apache.ratis.server.storage.RaftLogWorker:
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3067
> 2019-02-15 10:16:45,420 INFO org.apache.ratis.server.storage.RaftLogWorker:
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment
> log-3067_3077 to index:3077
> {noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]