morningman opened a new pull request #1581: Fix bug that cluster balance may cause load job failed URL: https://github.com/apache/incubator-doris/pull/1581 The bug is described in issue #1580 . And this patch will fix 2 cases of cluster balance 1. After finish adding the new replica, the new replica's version may not catch up with the visible version, so the new replica may be treated as a stale and redundant replica, which will be deleting at next tablet checking round. I add a mark named `needFurtherRepair` to the newly added replica, only when that replica's does not catch up with visible version. This replica will receive a further repair at next tablet checking round, instead of being deleted. 2. When deleting the redundant replicas, there may be some load jobs on it. Delete these replicas may cause the load job fail. So before deleting a redundant replica, I first mark a txn id on that replica, and set replica's state to CLONE. The CLONE state will ensure that no more load jobs will on that replica, and we will wait all load jobs before the marked txn id to be finished. After that, the replica can be deleted safely. ISSUE: #1580
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org