summaryzb opened a new issue, #927: URL: https://github.com/apache/incubator-uniffle/issues/927
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no similar issues. ### What would you like to be improved? when `rss.server.heartbeat.timeout` such as 60 * 1000 is bigger than `rss.server.heartbeat.interval` such as 10 * 1000, let's assume that just one heartbeat request was very slow effected by the network delay. The executorService `startHeartBeat` execute the action of sendHearBeat by `rss.server.heartbeat.interval` such as 10 * 1000 milliseconds. The executorService `sendHeartBeat` execute the real action of hearBeat RPC request, however when the RPC request is timeout, the thread in the executorService `sendHeartBeat` will be blocked. since no more two thread is available, the following sending request will be in the blockingQueue. As a result, the server only send one hearBeat request in 60 * 1000 milliseconds, that will lead coordinator delete the server, since the server heartBeat exceeding `rss.coordinator.server.heartbeat.timeout`. ### How should we improve? 1. `rss.server.heartbeat.timeout` should be no more than `rss.server.heartbeat.interval`. 2. Eleminate `rss.server.heartbeat.timeout` replaced with `rss.server.heartbeat.interval`. I preffer the second resolution. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
