[
https://issues.apache.org/jira/browse/BOOKKEEPER-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009546#comment-14009546
]
Ivan Kelly commented on BOOKKEEPER-745:
---------------------------------------
bq. Can we move waitIfLedgerReplicationDisabled(); above to
generateBookie2LedgersIndex(). This would make bk2ledger indexing after
replication enabled otw it may continue with the old index, also I feel later
it would be helpful when doing IP to hostname meta changes.
The reason I put it after the generateBookie2LedgersIndex() is that this method
can run for a long time. So it could be running when a rolling restart begins,
and then the ledgers would be marked while autoreplication is disabled. Putting
the wait after, and having the bk2ledger map a little stale is ok though,
because we are only looking for the ledgers which are on the bookie that
failed. There will be no new ledgers added to that bookie after that bookie has
failed, so we still get the same list of ledgers. If another bookie fails, the
bookie check will run again after autoreplication is reenabled.
I've addressed the rest of the comments.
> Fix for false reports of ledger unreplication during rolling restarts.
> ----------------------------------------------------------------------
>
> Key: BOOKKEEPER-745
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-745
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-auto-recovery
> Reporter: Ivan Kelly
> Assignee: Ivan Kelly
> Fix For: 4.3.0, 4.2.3
>
> Attachments:
> 0001-Fix-for-false-reports-of-ledger-unreplication-.trunk.patch,
> 0001-Fix-for-false-reports-of-ledger-unreplication-.trunk.patch,
> 0002-Fix-for-false-reports-of-ledger-unreplication-.trunk.patch,
> 0004-Fix-for-false-reports-of-ledger-unreplication-.trunk.patch,
> 0006-Fix-for-false-reports-of-ledger-unreplicat.branch4.2.patch
>
>
> The bug occurred because there was no check if rereplication was enabled or
> not when the auditor came online. When the auditor comes online it does a
> check of which bookies are up and marks the ledgers on missing bookies as
> underreplicated. In the false report case, the auditor was running after each
> bookie was bounced due to the way leader election for the auditor works. And
> since one bookie was down since you're bouncing the server, all ledgers on
> that bookie will get marked as underreplicated.
--
This message was sent by Atlassian JIRA
(v6.2#6252)