morningman opened a new pull request #1581: Fix bug that cluster balance may 
cause load job failed
URL: https://github.com/apache/incubator-doris/pull/1581
 
 
   The bug is described in issue #1580 . And this patch will fix 2 cases of 
cluster balance
   
   1. After finish adding the new replica, the new replica's version may not 
catch up with
   the visible version, so the new replica may be treated as a stale and 
redundant replica, which
   will be deleting at next tablet checking round.
   
       I add a mark named `needFurtherRepair` to the newly added replica, only 
when that replica's
   does not catch up with visible version. This replica will receive a further 
repair at next tablet checking round, instead of being deleted.
   
   2. When deleting the redundant replicas, there may be some load jobs on it. 
Delete these replicas may cause the load job fail.
   
       So before deleting a redundant replica, I first mark a txn id on that 
replica, and set replica's
   state to CLONE. The CLONE state will ensure that no more load jobs will on 
that replica, and we
   will wait all load jobs before the marked txn id to be finished. After that, 
the replica can be deleted safely.
   
   ISSUE: #1580

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org

Reply via email to