[ 
https://issues.apache.org/jira/browse/RATIS-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved RATIS-2039.
-------------------------------
    Resolution: Not A Problem

Resolving this as "not a problem".

> Leader lection requests should apply timeout close to election timeout
> ----------------------------------------------------------------------
>
>                 Key: RATIS-2039
>                 URL: https://issues.apache.org/jira/browse/RATIS-2039
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Sammi Chen
>            Priority: Major
>
> In the following log,  election timeout is 5s.  om62 election failed when its 
> vote request timeouted at GRPC level with 3s although it passed the pre-vote 
> process.  After the request timeout, there is no retry, although it's still 
> 2s left toward election timeout. The ideal state is om62 requests use the 
> same(or a little bit smaller than) timeout as election timeout, so that in 
> the following case, the election will finally got response and finish 
> successfully. 
> {code:java}
> 2024-03-04 04:14:21,186 WARN 
> [grpc-default-executor-30]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om63-AppendLogResponseHandler: Failed appendEntries: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2024-03-04 04:14:21,188 WARN 
> [grpc-default-executor-30]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om63-GrpcLogAppender: Follower failed (request=null, 
> errorCount=796555); keep nextIndex (209384) unchanged and retry.
> 2024-03-04 04:14:25,169 INFO [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
> [email protected] (auth:TOKEN)
> 2024-03-04 04:14:25,171 INFO [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for [email protected] (auth:TOKEN) for 
> protocol=interface org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol
> 2024-03-04 04:14:26,186 WARN 
> [timer5]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om63-AppendLogResponseHandler: Failed appendEntries 
> (Repeated 8 times in the last 5.000s): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2024-03-04 04:14:26,188 WARN 
> [timer6]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om63-GrpcLogAppender: Follower failed (request=null, 
> errorCount=796562); keep nextIndex (209384) unchanged and retry. (Repeated 8 
> times in the last 5.000s)
> 2024-03-04 04:14:26,193 WARN 
> [grpc-default-executor-21]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om63-AppendLogResponseHandler: Failed appendEntries: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2024-03-04 04:14:26,194 WARN 
> [grpc-default-executor-21]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om63-GrpcLogAppender: Follower failed (request=null, 
> errorCount=796563); keep nextIndex (209384) unchanged and retry.
> 2024-03-04 04:14:26,261 WARN 
> [timer6]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om61-GrpcLogAppender: Timed out HEARTBEAT 
> appendEntries, errorCount=1, 
> request=AppendEntriesRequest:cid=254709,entriesCount=0
> 2024-03-04 04:14:26,653 WARN 
> [om62@group-009531620E4C-LeaderStateImpl]-org.apache.ratis.server.RaftServer$Division:
>  om62@group-009531620E4C-LeaderStateImpl: Lost leadership on term: 19. 
> Election timeout: 5200ms. In charge for: 415277057ms. Conf: 142632: 
> peers:[om61|ccycloud-2.ozn-hbchf3oz.root.comops.site:9872, 
> om62|ccycloud-6.ozn-hbchf3oz.root.comops.site:9872, 
> om63|ccycloud-4.ozn-hbchf3oz.root.comops.site:9872]|listeners:[], old=null
> 2024-03-04 04:14:26,659 WARN 
> [om62@group-009531620E4C-LeaderStateImpl]-org.apache.ratis.server.RaftServer$Division:
>  Follower om62@group-009531620E4C->om61(c209383,m209383,n209384, 
> caughtUp=true, lastRpcSendTime=898, lastRpcResponseTime=5900)
> 2024-03-04 04:14:26,659 WARN 
> [om62@group-009531620E4C-LeaderStateImpl]-org.apache.ratis.server.RaftServer$Division:
>  Follower om62@group-009531620E4C->om63(c-1,m-1,n209384, caughtUp=true, 
> lastRpcSendTime=464, lastRpcResponseTime=415282260)
> 2024-03-04 04:14:26,661 INFO 
> [om62@group-009531620E4C-LeaderStateImpl]-org.apache.ratis.server.RaftServer$Division:
>  om62@group-009531620E4C: changes role from    LEADER to FOLLOWER at term 19 
> for StepDownReason:LOST_MAJORITY_HEARTBEATS
> 2024-03-04 04:14:26,662 INFO 
> [om62@group-009531620E4C-LeaderStateImpl]-org.apache.ratis.server.impl.RoleInfo:
>  om62: shutdown om62@group-009531620E4C-LeaderStateImpl
> 2024-03-04 04:14:26,681 WARN 
> [om62@group-009531620E4C->om63-GrpcLogAppender-LogAppenderDaemon]-org.apache.ratis.grpc.server.GrpcLogAppender:
>  om62@group-009531620E4C->om63-GrpcLogAppender: Wait interrupted by 
> java.lang.InterruptedException
> 2024-03-04 04:14:26,681 WARN 
> [om62@group-009531620E4C->om61-GrpcLogAppender-LogAppenderDaemon]-org.apache.ratis.grpc.server.GrpcLogAppender:
>  om62@group-009531620E4C->om61-GrpcLogAppender: Wait interrupted by 
> java.lang.InterruptedException
> 2024-03-04 04:14:26,688 INFO 
> [om62@group-009531620E4C-LeaderStateImpl]-org.apache.ratis.server.impl.PendingRequests:
>  om62@group-009531620E4C-PendingRequests: sendNotLeaderResponses
> 2024-03-04 04:14:26,700 INFO 
> [om62@group-009531620E4C-LeaderStateImpl]-org.apache.ratis.server.RaftServer$Division:
>  om62@group-009531620E4C: change Leader from om62 to null at term 19 for 
> StepDownReason:LOST_MAJORITY_HEARTBEATS
> 2024-03-04 04:14:26,702 INFO 
> [om62@group-009531620E4C-LeaderStateImpl]-org.apache.ratis.server.impl.RoleInfo:
>  om62: start om62@group-009531620E4C-FollowerState
> 2024-03-04 04:14:27,443 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.105.13:51482:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:27,510 WARN 
> [timer6]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om61-GrpcLogAppender: Timed out HEARTBEAT 
> appendEntries, errorCount=1, 
> request=AppendEntriesRequest:cid=254710,entriesCount=0
> 2024-03-04 04:14:28,656 INFO 
> [grpc-default-executor-30]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om61-AppendLogResponseHandler: follower responses 
> appendEntries COMPLETED
> 2024-03-04 04:14:28,704 INFO 
> [grpc-default-executor-30]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om61-AppendLogResponseHandler: follower responses 
> appendEntries COMPLETED
> 2024-03-04 04:14:30,316 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.105.13:60082:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:30,586 ERROR 
> [qtp1802837169-142]-org.apache.hadoop.ozone.om.OzoneManager: No leader found
> 2024-03-04 04:14:31,193 WARN 
> [timer6]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om63-AppendLogResponseHandler: Failed appendEntries 
> (Repeated 2 times in the last 5.000s): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2024-03-04 04:14:31,194 WARN 
> [timer7]-org.apache.ratis.grpc.server.GrpcLogAppender: 
> om62@group-009531620E4C->om63-GrpcLogAppender: Follower failed (request=null, 
> errorCount=796564); keep nextIndex (209384) unchanged and retry. (Repeated 2 
> times in the last 5.000s)
> 2024-03-04 04:14:31,902 INFO 
> [om62@group-009531620E4C-FollowerState]-org.apache.ratis.server.impl.FollowerState:
>  om62@group-009531620E4C-FollowerState: Skipping leader election since it 
> stepped down recently (elapsed = 5199ms < waitTime = 10s)
> 2024-03-04 04:14:32,541 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.105.13:60094:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:34,582 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.105.13:60102:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:34,884 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.121.6:51180:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:36,712 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.105.13:60112:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:36,898 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.121.6:35118:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:37,061 INFO 
> [om62@group-009531620E4C-FollowerState]-org.apache.ratis.server.impl.FollowerState:
>  om62@group-009531620E4C-FollowerState: change to CANDIDATE, 
> lastRpcElapsedTime:10360116624ns, electionTimeout:5159ms
> 2024-03-04 04:14:37,061 INFO 
> [om62@group-009531620E4C-FollowerState]-org.apache.ratis.server.impl.RoleInfo:
>  om62: shutdown om62@group-009531620E4C-FollowerState
> 2024-03-04 04:14:37,061 INFO 
> [om62@group-009531620E4C-FollowerState]-org.apache.ratis.server.RaftServer$Division:
>  om62@group-009531620E4C: changes role from  FOLLOWER to CANDIDATE at term 19 
> for changeToCandidate
> 2024-03-04 04:14:37,066 INFO 
> [om62@group-009531620E4C-FollowerState]-org.apache.ratis.server.RaftServerConfigKeys:
>  raft.server.leaderelection.pre-vote = true (default)
> 2024-03-04 04:14:37,066 INFO 
> [om62@group-009531620E4C-FollowerState]-org.apache.ratis.server.impl.RoleInfo:
>  om62: start om62@group-009531620E4C-LeaderElection8
> 2024-03-04 04:14:37,068 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8 PRE_VOTE round 0: submit vote 
> requests at term 19 for 142632: 
> peers:[om61|ccycloud-2.ozn-hbchf3oz.root.comops.site:9872, 
> om62|ccycloud-6.ozn-hbchf3oz.root.comops.site:9872, 
> om63|ccycloud-4.ozn-hbchf3oz.root.comops.site:9872]|listeners:[], old=null
> 2024-03-04 04:14:37,091 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8 got exception when requesting votes: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2024-03-04 04:14:38,524 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.8.131:60396:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:38,801 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.105.13:60118:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:39,165 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.121.6:35128:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:39,370 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8: PRE_VOTE PASSED received 1 
> response(s) and 1 exception(s):
> 2024-03-04 04:14:39,372 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>    Response 0: om62<-om61#0:OK-t19
> 2024-03-04 04:14:39,373 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>    Exception 1: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2024-03-04 04:14:39,373 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8 PRE_VOTE round 0: result PASSED
> 2024-03-04 04:14:39,411 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8 ELECTION round 0: submit vote 
> requests at term 20 for 142632: 
> peers:[om61|ccycloud-2.ozn-hbchf3oz.root.comops.site:9872, 
> om62|ccycloud-6.ozn-hbchf3oz.root.comops.site:9872, 
> om63|ccycloud-4.ozn-hbchf3oz.root.comops.site:9872]|listeners:[], old=null
> 2024-03-04 04:14:39,413 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8 got exception when requesting votes: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2024-03-04 04:14:39,551 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.8.131:59726:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:41,081 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.105.13:59576:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:41,085 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.8.131:59734:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:41,376 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.121.6:35140:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:41,574 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.8.131:59746:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:41,919 WARN [Socket Reader #1 for port 
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
> 10.140.8.131:59754:null (DIGEST-MD5: IO error acquiring password) with true 
> cause: (OM:om62 is not the leader. Could not determine the leader node.)
> 2024-03-04 04:14:42,414 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8 got exception when requesting votes: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> DEADLINE_EXCEEDED: deadline exceeded after 2.999711304s. 
> [remote_addr=ccycloud-2.ozn-hbchf3oz.root.comops.site/10.140.146.6:9872]
> 2024-03-04 04:14:42,414 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8: ELECTION REJECTED received 0 
> response(s) and 2 exception(s):
> 2024-03-04 04:14:42,414 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>    Exception 0: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2024-03-04 04:14:42,414 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>    Exception 1: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> DEADLINE_EXCEEDED: deadline exceeded after 2.999711304s. 
> [remote_addr=ccycloud-2.ozn-hbchf3oz.root.comops.site/10.140.146.6:9872]
> 2024-03-04 04:14:42,414 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.LeaderElection:
>  om62@group-009531620E4C-LeaderElection8 ELECTION round 0: result REJECTED
> 2024-03-04 04:14:42,414 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.RaftServer$Division:
>  om62@group-009531620E4C: changes role from CANDIDATE to FOLLOWER at term 20 
> for REJECTED
> 2024-03-04 04:14:42,414 INFO 
> [om62@group-009531620E4C-LeaderElection8]-org.apache.ratis.server.impl.RoleInfo:
>  om62: shutdown om62@group-009531620E4C-LeaderElection8
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to