runzhiwang commented on pull request #383: URL: https://github.com/apache/incubator-ratis/pull/383#issuecomment-761709658
@szetszwo Hi, with leader lease, the CI becomes flaky, there are 2 reasons: 1. The CI environment' machine has a low performance, some times one rpc call cost more then 20ms from send to receive. 2. In current ratis implementation, leader sends log or heartbeat to follower every 75ms, if there is log, leader will not send heartbeat again. For example, at 0ms there is log and leader send log to follower, at 75ms there is log and leader send log to follower again, ..., at 750ms there is log and leader send log to follower again. So you can find from 0ms to 750ms, log always exist, leader always send log, never heartbeat. But if each log need more than 1000ms to: WriteDisk, applyTransaction, then leader will not receive any reply from 0-1000ms, then leader lease becomes invalid frequently. I think we have following options: 1. Increase rpc.timeout.min from 150ms to 1500ms 2. Default disable leader lease, then CI need not to consider leader lease What do you think ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
