[ 
https://issues.apache.org/jira/browse/RATIS-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457112#comment-17457112
 ] 

Sammi Chen commented on RATIS-1465:
-----------------------------------

Here are the LOGs 

2021-12-02 13:57:37,719 
[98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008-LeaderStateImpl] WARN 
org.apache.ratis.server.RaftServer$Division: 
98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008-LeaderStateImpl: Lost 
leadership on term: 1. Election timeout: 5200ms. In charge for: 2290684ms. 
Conf: 0: 
[5a4a8be1-c921-4ca7-af7c-62a37a55cab7|rpc:9.186.21.247:9856|admin:9.186.21.247:9857|client:9.186.21.247:9858|dataStream:|priority:0,
 
efdf0ed2-f836-4f4b-9dc8-981416d8a68d|rpc:9.37.156.222:9856|admin:9.37.156.222:9857|client:9.37.156.222:9858|dataStream:|priority:0,
 
98e5b27a-c3e9-4f86-ab85-b2caf84f012b|rpc:9.186.21.242:9856|admin:9.186.21.242:9857|client:9.186.21.242:9858|dataStream:|priority:1],
 old=null
2021-12-02 13:57:37,719 
[98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008-LeaderStateImpl] WARN 
org.apache.ratis.server.RaftServer$Division: Follower 
98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008->5a4a8be1-c921-4ca7-af7c-62a37a55cab7(c138703,m138989,n139378,
 attendVote=true, lastRpcSendTime=987, lastRpcResponseTime=8491)
2021-12-02 13:57:37,719 
[98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008-LeaderStateImpl] WARN 
org.apache.ratis.server.RaftServer$Division: Follower 
98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008->efdf0ed2-f836-4f4b-9dc8-981416d8a68d(c138814,m139094,n139378,
 attendVote=true, lastRpcSendTime=318, lastRpcResponseTime=6570)

> Use seperate channel for group heartbeat
> ----------------------------------------
>
>                 Key: RATIS-1465
>                 URL: https://issues.apache.org/jira/browse/RATIS-1465
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>         Attachments: follower-hb-process-latency-with-patch.png, 
> follower-hb-process-latency.png, leader-hb-receive-latency-1.png, 
> leader-hb-receive-latency-with-patch.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a heavy load read/write cluster,  frequent leader step down is observed 
> because of lost the majority heartbeat. 
> The investigation shows that follower side heartbeat process is very quick, 
> while the leader side heartbeat latency is high.  See the attached metrics 
> diagram. 
> This task aims to use seperate grpc channel for heartbeat to reduce the 
> latency introduced by the network queuing. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to