[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459582#comment-13459582
 ] 

Rakesh R commented on BOOKKEEPER-278:
-------------------------------------

Thanks Ivan and Uma for your time and responses. Could you please go through 
the following and would like to know the opinion.

@Ivan
bq.is that the markLedgerUnderreplicated is blocking
Yup, its a blocking call and latch enters into infinite waiting state if it 
sees a 'disable' znode.

bq.but there will be a number call calls to it queued up once it is unblocked.
Hope you are pointing me to: the multiple bookie failure notifictions which are 
queuing into 'bookieNotifications' queue.

As we know Auditor is recieving the bookie failure notifications only through 
the getChildren() watcher. When Auditor enters into the waiting state, it will 
be in a blocking call at markLedgerUnderreplicated() and consequently run() 
method also will not be finished unless recieved 'enable' notification. Since 
Auditor has only registered one getChildren() zk watcher before enters to 
waiting state, at max he will recieve only one bookie failure notification and 
will not see further failures(because watcher is already fired and not doing 
the reregistration of it). After enabling, anyway he is getting available 
bookies and will recalculate lost bookies...and continue the cycle. Am I 
missing anything?

Its good scenario, I will add one more test case: "behaviour of multiple bookie 
failures in disable mode".

bq.It would be better for the auditor to check is auto recovery is enabled 
after seeing a bookie drop, and only build the index, mark the ledgers, if it 
is enabled.
I agree to place the disable checks just before processing bookie failure. In 
that case, once it started generating index, will finish the publishing/cycle 
of ledgers. Then, only on the next bookie failure notification he will enter 
into the waiting state. Does this sound good to you?

                
> Ability to disable auto recovery temporarily
> --------------------------------------------
>
>                 Key: BOOKKEEPER-278
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-278
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>    Affects Versions: 4.0.0
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>         Attachments: BOOKKEEPER-278.patch
>
>
> Administrators will need to do rolling upgrades of bookies. If auto recovery 
> is enabled during a rolling upgrade, there will be a lot of thrashing of 
> ledgers as they recovery gets kicked off. Therefore we need a way to 
> temporarily disable it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to