[ 
https://issues.apache.org/jira/browse/HDDS-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841377#comment-17841377
 ] 

Hemant Kumar edited comment on HDDS-10738 at 4/26/24 10:57 PM:
---------------------------------------------------------------

There was a leader election around 2024-04-22 10:42:44
{code:java}
2024-04-22 10:42:44,154 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.impl.FollowerState:
 om131@group-2BC026ED99AC-FollowerState: change to CANDIDATE, 
lastRpcElapsedTime:6122814232ns, electionTimeout:5091ms
2024-04-22 10:42:44,156 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.impl.RoleInfo: 
om131: shutdown om131@group-2BC026ED99AC-FollowerState
2024-04-22 10:42:44,156 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: changes role from  FOLLOWER to CANDIDATE at term 21 
for changeToCandidate
2024-04-22 10:42:44,166 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.leaderelection.pre-vote = true (default)
2024-04-22 10:42:44,166 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.impl.RoleInfo: 
om131: start om131@group-2BC026ED99AC-LeaderElection1
2024-04-22 10:42:44,170 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: change Leader from om132 to null at term 21 for 
PRE_VOTE
2024-04-22 10:42:44,171 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1 PRE_VOTE round 0: submit vote 
requests at term 21 for 3507839: 
peers:[om132|rpc:vc0114.***.com:9872|admin:|client:|dataStre
am:|priority:0|startupRole:FOLLOWER, 
om133|rpc:vc0111.***.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
 
om131|rpc:vc0113.***.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
 old=null
2024-04-22 10:42:44,211 INFO 
[om131@group-2BC026ED99AC-LeaderElection1-2]-org.apache.ratis.grpc.server.GrpcServerProtocolClient:
 Build channel for om133
2024-04-22 10:42:44,211 INFO 
[om131@group-2BC026ED99AC-LeaderElection1-1]-org.apache.ratis.grpc.server.GrpcServerProtocolClient:
 Build channel for om132
2024-04-22 10:42:44,473 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1: PRE_VOTE PASSED received 1 
response(s) and 0 exception(s):
2024-04-22 10:42:44,473 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
   Response 0: om131<-om133#0:OK-t21
2024-04-22 10:42:44,473 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1 PRE_VOTE round 0: result PASSED
2024-04-22 10:42:44,497 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1 ELECTION round 0: submit vote 
requests at term 22 for 3507839: 
peers:[om132|rpc:vc0114.***.com:9872|admin:|client:|dataStre
am:|priority:0|startupRole:FOLLOWER, 
om133|rpc:vc0111.***.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
 
om131|rpc:vc0113.***.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
 old=null
2024-04-22 10:42:44,585 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1: ELECTION PASSED received 1 
response(s) and 0 exception(s):
2024-04-22 10:42:44,585 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
   Response 0: om131<-om133#0:OK-t22
2024-04-22 10:42:44,585 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1 ELECTION round 0: result PASSED
2024-04-22 10:42:44,585 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.RoleInfo:
 om131: shutdown om131@group-2BC026ED99AC-LeaderElection1
2024-04-22 10:42:44,587 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: changes role from CANDIDATE to LEADER at term 22 for 
changeToLeader
2024-04-22 10:42:44,589 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: change Leader from null to om131 at term 22 for 
becomeLeader, leader elected after 418ms
2024-04-22 10:42:44,593 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.staging.catchup.gap = 1000 (default)
2024-04-22 10:42:44,598 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.write.element-limit = 4096 (default)
2024-04-22 10:42:44,599 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.write.byte-limit = 64MB (=67108864) (default)
2024-04-22 10:42:44,611 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.watch.timeout = 10s (default)
2024-04-22 10:42:44,611 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.watch.timeout.denomination = 1s (default)
2024-04-22 10:42:44,612 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.watch.element-limit = 65536 (default)
2024-04-22 10:42:44,617 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.write.element-limit = 4096 (default)
2024-04-22 10:42:44,619 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.write.follower.gap.ratio.max = -1.0 (default)
2024-04-22 10:42:44,631 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
2024-04-22 10:42:44,631 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
2024-04-22 10:42:44,631 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.buffer.element-limit = 1024 (custom)
2024-04-22 10:42:44,633 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.wait-time.min = 0ms (custom)
2024-04-22 10:42:44,633 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.leader.outstanding.appends.max = 128 (default)
2024-04-22 10:42:44,634 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.rpc.request.timeout = 3000ms (default)
2024-04-22 10:42:44,635 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.install_snapshot.request.element-limit = 8 (default)
2024-04-22 10:42:44,635 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.install_snapshot.request.timeout = 3000ms (default)
2024-04-22 10:42:44,635 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.install.snapshot.enabled = false (custom)
2024-04-22 10:42:44,635 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.heartbeat.channel = true (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.buffer.element-limit = 1024 (custom)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.wait-time.min = 0ms (custom)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.leader.outstanding.appends.max = 128 (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.rpc.request.timeout = 3000ms (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.install_snapshot.request.element-limit = 8 (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.install_snapshot.request.timeout = 3000ms (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.install.snapshot.enabled = false (custom)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.heartbeat.channel = true (default)
2024-04-22 10:42:44,640 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.RoleInfo:
 om131: start om131@group-2BC026ED99AC-LeaderStateImpl
2024-04-22 10:42:44,644 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
 om131@group-2BC026ED99AC-SegmentedRaftLogWorker: Rolling segment 
log-3569775_3573639 to index:3573639
2024-04-22 10:42:44,646 INFO 
[om131@group-2BC026ED99AC-SegmentedRaftLogWorker]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
 om131@group-2BC026ED99AC-SegmentedRaftLogWorker: Rolled log segment from 
/var/lib/hadoop-ozone/om/ratis/6768abc1-a68c-3f5e-a8d2-2bc026ed99ac/curren
t/log_inprogress_3569775 to 
/var/lib/hadoop-ozone/om/ratis/6768abc1-a68c-3f5e-a8d2-2bc026ed99ac/current/log_3569775-3573639
2024-04-22 10:42:44,663 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: set configuration 3573640: 
peers:[om132|rpc:vc0114.***.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
 om133|rpc:vc01
11.***.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER, 
om131|rpc:vc0113.***.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
 old=null
2024-04-22 10:42:44,726 WARN 
[grpc-default-executor-9]-org.apache.ratis.grpc.server.GrpcLogAppender: 
om131@group-2BC026ED99AC->om132-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
om132@group-2BC026ED99AC is not in [STARTING, RUN
NING]: current state is CLOSING
2024-04-22 10:42:44,728 INFO 
[grpc-default-executor-9]-org.apache.ratis.server.leader.FollowerInfo: 
om131@group-2BC026ED99AC->om132: decreaseNextIndex nextIndex: 
updateUnconditionally 3573641 -> 3573640
2024-04-22 10:42:44,730 INFO 
[om131@group-2BC026ED99AC-SegmentedRaftLogWorker]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
 om131@group-2BC026ED99AC-SegmentedRaftLogWorker: created new log segment 
/var/lib/hadoop-ozone/om/ratis/6768abc1-a68c-3f5e-a8d2-2bc026ed99ac/curren
t/log_inprogress_3573640
2024-04-22 10:42:44,802 WARN 
[grpc-default-executor-13]-org.apache.ratis.grpc.server.GrpcLogAppender: 
om131@group-2BC026ED99AC->om132-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
om132@group-2BC026ED99AC is not in [STARTING, RU
NNING]: current state is CLOSING
2024-04-22 10:42:44,803 INFO 
[grpc-default-executor-13]-org.apache.ratis.server.leader.FollowerInfo: 
om131@group-2BC026ED99AC->om132: decreaseNextIndex nextIndex: 
updateUnconditionally 3573641 -> 3573640
2024-04-22 10:42:44,899 INFO 
[om131@group-2BC026ED99AC-StateMachineUpdater]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
 Received Configuration change notification from Ratis. New Peer list:
[id: "om132"
address: "vc0114.***.com:9872"
startupRole: FOLLOWER
, id: "om133"
address: "vc0111.***.com:9872"
startupRole: FOLLOWER
, id: "om131"
address: "vc0113.***.com:9872"
startupRole: FOLLOWER
]
{code}

OM Id and host mapping:
{code:java}
om132 => vc0114.***.com:9872
om133 => vc0111.***.com:9872
om131 => vc0113.***.com:9872
 {code}


was (Author: JIRAUSER297350):
There was a leader election around 2024-04-22 10:42:44
{code:java}
2024-04-22 10:42:44,154 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.impl.FollowerState:
 om131@group-2BC026ED99AC-FollowerState: change to CANDIDATE, 
lastRpcElapsedTime:6122814232ns, electionTimeout:5091ms
2024-04-22 10:42:44,156 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.impl.RoleInfo: 
om131: shutdown om131@group-2BC026ED99AC-FollowerState
2024-04-22 10:42:44,156 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: changes role from  FOLLOWER to CANDIDATE at term 21 
for changeToCandidate
2024-04-22 10:42:44,166 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.leaderelection.pre-vote = true (default)
2024-04-22 10:42:44,166 INFO 
[om131@group-2BC026ED99AC-FollowerState]-org.apache.ratis.server.impl.RoleInfo: 
om131: start om131@group-2BC026ED99AC-LeaderElection1
2024-04-22 10:42:44,170 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: change Leader from om132 to null at term 21 for 
PRE_VOTE
2024-04-22 10:42:44,171 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1 PRE_VOTE round 0: submit vote 
requests at term 21 for 3507839: 
peers:[om132|rpc:vc0114.halxg.cloudera.com:9872|admin:|client:|dataStre
am:|priority:0|startupRole:FOLLOWER, 
om133|rpc:vc0111.halxg.cloudera.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
 
om131|rpc:vc0113.halxg.cloudera.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
 old=null
2024-04-22 10:42:44,211 INFO 
[om131@group-2BC026ED99AC-LeaderElection1-2]-org.apache.ratis.grpc.server.GrpcServerProtocolClient:
 Build channel for om133
2024-04-22 10:42:44,211 INFO 
[om131@group-2BC026ED99AC-LeaderElection1-1]-org.apache.ratis.grpc.server.GrpcServerProtocolClient:
 Build channel for om132
2024-04-22 10:42:44,473 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1: PRE_VOTE PASSED received 1 
response(s) and 0 exception(s):
2024-04-22 10:42:44,473 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
   Response 0: om131<-om133#0:OK-t21
2024-04-22 10:42:44,473 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1 PRE_VOTE round 0: result PASSED
2024-04-22 10:42:44,497 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1 ELECTION round 0: submit vote 
requests at term 22 for 3507839: 
peers:[om132|rpc:vc0114.halxg.cloudera.com:9872|admin:|client:|dataStre
am:|priority:0|startupRole:FOLLOWER, 
om133|rpc:vc0111.halxg.cloudera.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
 
om131|rpc:vc0113.halxg.cloudera.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
 old=null
2024-04-22 10:42:44,585 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1: ELECTION PASSED received 1 
response(s) and 0 exception(s):
2024-04-22 10:42:44,585 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
   Response 0: om131<-om133#0:OK-t22
2024-04-22 10:42:44,585 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.LeaderElection:
 om131@group-2BC026ED99AC-LeaderElection1 ELECTION round 0: result PASSED
2024-04-22 10:42:44,585 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.RoleInfo:
 om131: shutdown om131@group-2BC026ED99AC-LeaderElection1
2024-04-22 10:42:44,587 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: changes role from CANDIDATE to LEADER at term 22 for 
changeToLeader
2024-04-22 10:42:44,589 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: change Leader from null to om131 at term 22 for 
becomeLeader, leader elected after 418ms
2024-04-22 10:42:44,593 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.staging.catchup.gap = 1000 (default)
2024-04-22 10:42:44,598 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.write.element-limit = 4096 (default)
2024-04-22 10:42:44,599 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.write.byte-limit = 64MB (=67108864) (default)
2024-04-22 10:42:44,611 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.watch.timeout = 10s (default)
2024-04-22 10:42:44,611 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.watch.timeout.denomination = 1s (default)
2024-04-22 10:42:44,612 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.watch.element-limit = 65536 (default)
2024-04-22 10:42:44,617 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.write.element-limit = 4096 (default)
2024-04-22 10:42:44,619 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.write.follower.gap.ratio.max = -1.0 (default)
2024-04-22 10:42:44,631 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
2024-04-22 10:42:44,631 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
2024-04-22 10:42:44,631 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.buffer.element-limit = 1024 (custom)
2024-04-22 10:42:44,633 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.wait-time.min = 0ms (custom)
2024-04-22 10:42:44,633 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.leader.outstanding.appends.max = 128 (default)
2024-04-22 10:42:44,634 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.rpc.request.timeout = 3000ms (default)
2024-04-22 10:42:44,635 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.install_snapshot.request.element-limit = 8 (default)
2024-04-22 10:42:44,635 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.install_snapshot.request.timeout = 3000ms (default)
2024-04-22 10:42:44,635 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.install.snapshot.enabled = false (custom)
2024-04-22 10:42:44,635 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.heartbeat.channel = true (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.buffer.element-limit = 1024 (custom)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.wait-time.min = 0ms (custom)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.leader.outstanding.appends.max = 128 (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.rpc.request.timeout = 3000ms (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.install_snapshot.request.element-limit = 8 (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.install_snapshot.request.timeout = 3000ms (default)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServerConfigKeys:
 raft.server.log.appender.install.snapshot.enabled = false (custom)
2024-04-22 10:42:44,639 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.grpc.GrpcConfigKeys:
 raft.grpc.server.heartbeat.channel = true (default)
2024-04-22 10:42:44,640 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.impl.RoleInfo:
 om131: start om131@group-2BC026ED99AC-LeaderStateImpl
2024-04-22 10:42:44,644 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
 om131@group-2BC026ED99AC-SegmentedRaftLogWorker: Rolling segment 
log-3569775_3573639 to index:3573639
2024-04-22 10:42:44,646 INFO 
[om131@group-2BC026ED99AC-SegmentedRaftLogWorker]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
 om131@group-2BC026ED99AC-SegmentedRaftLogWorker: Rolled log segment from 
/var/lib/hadoop-ozone/om/ratis/6768abc1-a68c-3f5e-a8d2-2bc026ed99ac/curren
t/log_inprogress_3569775 to 
/var/lib/hadoop-ozone/om/ratis/6768abc1-a68c-3f5e-a8d2-2bc026ed99ac/current/log_3569775-3573639
2024-04-22 10:42:44,663 INFO 
[om131@group-2BC026ED99AC-LeaderElection1]-org.apache.ratis.server.RaftServer$Division:
 om131@group-2BC026ED99AC: set configuration 3573640: 
peers:[om132|rpc:vc0114.halxg.cloudera.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
 om133|rpc:vc01
11.halxg.cloudera.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
 
om131|rpc:vc0113.halxg.cloudera.com:9872|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
 old=null
2024-04-22 10:42:44,726 WARN 
[grpc-default-executor-9]-org.apache.ratis.grpc.server.GrpcLogAppender: 
om131@group-2BC026ED99AC->om132-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
om132@group-2BC026ED99AC is not in [STARTING, RUN
NING]: current state is CLOSING
2024-04-22 10:42:44,728 INFO 
[grpc-default-executor-9]-org.apache.ratis.server.leader.FollowerInfo: 
om131@group-2BC026ED99AC->om132: decreaseNextIndex nextIndex: 
updateUnconditionally 3573641 -> 3573640
2024-04-22 10:42:44,730 INFO 
[om131@group-2BC026ED99AC-SegmentedRaftLogWorker]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
 om131@group-2BC026ED99AC-SegmentedRaftLogWorker: created new log segment 
/var/lib/hadoop-ozone/om/ratis/6768abc1-a68c-3f5e-a8d2-2bc026ed99ac/curren
t/log_inprogress_3573640
2024-04-22 10:42:44,802 WARN 
[grpc-default-executor-13]-org.apache.ratis.grpc.server.GrpcLogAppender: 
om131@group-2BC026ED99AC->om132-AppendLogResponseHandler: Failed appendEntries: 
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
om132@group-2BC026ED99AC is not in [STARTING, RU
NNING]: current state is CLOSING
2024-04-22 10:42:44,803 INFO 
[grpc-default-executor-13]-org.apache.ratis.server.leader.FollowerInfo: 
om131@group-2BC026ED99AC->om132: decreaseNextIndex nextIndex: 
updateUnconditionally 3573641 -> 3573640
2024-04-22 10:42:44,899 INFO 
[om131@group-2BC026ED99AC-StateMachineUpdater]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
 Received Configuration change notification from Ratis. New Peer list:
[id: "om132"
address: "vc0114.***.com:9872"
startupRole: FOLLOWER
, id: "om133"
address: "vc0111.***.com:9872"
startupRole: FOLLOWER
, id: "om131"
address: "vc0113.***.com:9872"
startupRole: FOLLOWER
]
{code}

OM Id and host mapping:
{code:java}
om132 => vc0114.***.com:9872
om133 => vc0111.***.com:9872
om131 => vc0113.***.com:9872
 {code}

> Unable to load snapshot exception encountered in a LR setup
> -----------------------------------------------------------
>
>                 Key: HDDS-10738
>                 URL: https://issues.apache.org/jira/browse/HDDS-10738
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM, Snapshot
>            Reporter: Jyotirmoy Sinha
>            Assignee: Hemant Kumar
>            Priority: Major
>              Labels: ozone-snapshot
>
> Scenario :
>  * Generate data over parallel threads over various volume/buckets
>  * Perform parallel snapshot create/delete/list operations over above buckets
>  * Perform parallel snapdiff operations over each bucket
>  * Perform parallel read operations of snapshot contents
>  * Introduce OM and cluster restarts in between along with DN decommissioning 
> and balancer restarts.
> Observation - When multiple threads are running with snapshot operations the 
> snapshot path contents are not accessible even after 20 mins
> Snapshot creation log OM -
> {code:java}
> 2024-04-22 11:18:49,123 INFO [OM StateMachine ApplyTransaction Thread - 
> 0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: 
> Created snapshot: 'snap1713809817' with snapshotId: 
> '849a95ab-c5bc-4b78-9d0a-fdad34fd331a' under path 'vol-dp4tz/buck-cp6e6' 
> {code}
> OM Error stacktrace - 
> {code:java}
> 2024-04-22 11:42:04,768 INFO [IPC Server handler 37 on 
> 9862]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: Checkpoint 
> directory: 60 didn't get created in 
> /var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-849a95ab-c5bc-4b78-9d0a-fdad34fd331a
>  secs.
> 2024-04-22 11:42:04,768 ERROR [IPC Server handler 37 on 
> 9862]-org.apache.hadoop.ozone.om.OmSnapshotManager: Failed to retrieve 
> snapshot: /vol-dp4tz/buck-cp6e6/snap1713809817
> TIMEOUT org.apache.hadoop.ozone.om.exceptions.OMException: Unable to load 
> snapshot. Snapshot checkpoint directory 
> '/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-849a95ab-c5bc-4b78-9d0a-fdad34fd331a'
>  does not exist yet. Please wait a few more seconds before retrying
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotUtils.checkSnapshotDirExist(SnapshotUtils.java:113)
>         at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:406)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:357)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$1(SnapshotCache.java:147)
>         at 
> java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:143)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager.checkForSnapshot(OmSnapshotManager.java:625)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.getReader(OzoneManager.java:4634)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.getFileStatus(OzoneManager.java:3572)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOzoneFileStatus(OzoneManagerRequestHandler.java:1002)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(OzoneManagerRequestHandler.java:257)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:220)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
>         at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
>         at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> 2024-04-22 11:42:04,770 WARN [IPC Server handler 37 on 
> 9862]-org.apache.hadoop.ipc.Server: IPC Server handler 37 on 9862, call 
> Call#2 Retry#0 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 
> 10.17.207.24:57514
> java.lang.IllegalStateException: TIMEOUT 
> org.apache.hadoop.ozone.om.exceptions.OMException: Unable to load snapshot. 
> Snapshot checkpoint directory 
> '/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-849a95ab-c5bc-4b78-9d0a-fdad34fd331a'
>  does not exist yet. Please wait a few more seconds before retrying {code}
> The above error is coming for multiple snapshots repeatedly and mostly coming 
> in parallel snapshot operations across various volume/buckets, not in serial 
> operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to