summaryzb opened a new issue, #927:
URL: https://github.com/apache/incubator-uniffle/issues/927

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What would you like to be improved?
   
   when `rss.server.heartbeat.timeout` such as 60 * 1000 is bigger than 
`rss.server.heartbeat.interval` such as 10 * 1000, let's assume that just one 
heartbeat request was very slow effected by the network delay.
   The executorService `startHeartBeat` execute the action of sendHearBeat by 
`rss.server.heartbeat.interval` such as 10 * 1000 milliseconds.
   The executorService `sendHeartBeat` execute the real action of hearBeat RPC 
request, however when the RPC request is timeout, the thread in the 
executorService `sendHeartBeat` will be blocked. since no more two thread is 
available, the following sending request will be in the blockingQueue.
   As a result, the server only send one hearBeat request in 60 * 1000 
milliseconds, that will lead coordinator delete the server, since the server 
heartBeat exceeding `rss.coordinator.server.heartbeat.timeout`.
   
   ### How should we improve?
   
   1. `rss.server.heartbeat.timeout` should be no more than 
`rss.server.heartbeat.interval`. 
   2. Eleminate `rss.server.heartbeat.timeout` replaced with 
`rss.server.heartbeat.interval`.
   
   I preffer  the second resolution.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to