[
https://issues.apache.org/jira/browse/BOOKKEEPER-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451787#comment-13451787
]
Rakesh R commented on BOOKKEEPER-278:
-------------------------------------
Hi Ivan, Thanks for the reviews.
bq.It would be better to disable at the auditor level.
I just confused by looking your comments. Its doing the disabling at Auditor
level as well as RW level also. Could you please give more details.
The proposed patch is a kind of delaying/waiting the replication
processes(Auditor and RW) by using a CountDownLatch. The logic what I've
followed is:
# Admin is calling disable call, then creats the 'disable' znode in
/underreplication root node.
# When Auditor recieved any failure notification, it will get lost
bookie/ledgers and during "markLedgerUnderreplicated" seeing "disable" znode
then add a znode watcher and enters to WAITING state.
# Also RW, if he tries to 'getLedgerToRereplicate' seeing "disable" znode then
add a znode watcher and enters to WAITING state.
CountDownLatch makes blocking wait and will internally suspending both the
Auditor, RW processes. On disabling both Auditor/RW will continue with the
previous populated data.
Enters to waiting state:
{code}
if (null != zkc.exists(basePath + '/' + DISABLE_NODE, w)) {
LOG.info("Automatic ledger re-replication is disabled "
+ "by Administrator!. So waiting until its enabled.")
changedLatch.await();
}
{code}
Comes out from the infinite waiting and only after "disable" node deletion:
{code}
if (e.getType() == Watcher.Event.EventType.NodeDeleted) {
changedLatch.countDown();
}
{code}
> Ability to disable auto recovery temporarily
> --------------------------------------------
>
> Key: BOOKKEEPER-278
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-278
> Project: Bookkeeper
> Issue Type: Sub-task
> Components: bookkeeper-auto-recovery
> Affects Versions: 4.0.0
> Reporter: Ivan Kelly
> Assignee: Rakesh R
> Fix For: 4.2.0
>
> Attachments: BOOKKEEPER-278.patch
>
>
> Administrators will need to do rolling upgrades of bookies. If auto recovery
> is enabled during a rolling upgrade, there will be a lot of thrashing of
> ledgers as they recovery gets kicked off. Therefore we need a way to
> temporarily disable it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira