[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270345#comment-13270345
 ] 

Ivan Kelly commented on BOOKKEEPER-237:
---------------------------------------


This doc is quite in line to what I had been thinking of. It needs to be broken 
into little parts though, because as it is now, it's quite hard to digest.

Firstly, the change to replication is a really a special case of what can be 
done with BOOKKEEPER-208. With BK-208, we have the concept of a write and an 
ack quorum. The write quorum is the set of bookies which an entry will be set 
to. The ack quorum is the number of bookies who must acknowledge the entry 
before it is acked to the client application. What is proposed in your doc 
seems to be this, but with the write quorum always set to ensemble size.

For the "Accountant", I think a better name would be "Auditor", as accountants 
have a habit of cooking the books ;). I think detection should be separated 
into two subparts, one performed by the accountant, which is elected. The other 
performed by the bookies themselves. 

 * The accountant check should check the zookeeper metadata only. For each 
ledger, it should check whether all the bookies are available. If not the 
ledger should be marked as underreplicated. As well as the conditions you 
propose in 1.6, it should also run periodically.

 * The bookie should check whether it can read the first and last entry for 
each ledger fragment which it is supposed to have[1]. If it finds that it 
cannot read a ledger, it should mark the ledger as underreplicated. This is 
basically a fsck. This should run periodically.

I don't think its necessary to maintain a mapping of bookies to ledgers. To 
rereplicate a ledger, it is necessary to read the whole ledger metadata for the 
ledger, so a mapping of which bookies should contain the ledger is 
straightforward to build on the fly.

The rereplication process should be distributed across all bookies. Each bookie 
should run a replication process which takes the first available unreplicated 
ledger off the queue and rereplicates it. I don't think we should take load 
balancing into consideration yet, as it complicates matters a lot.


[1] This won't necessarily be the first and last entry of the fragment, due to 
striping.
                
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
>                 Key: BOOKKEEPER-237
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
>             Project: Bookkeeper
>          Issue Type: New Feature
>          Components: bookkeeper-client, bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server 
> dies, there is no automatic mechanism to identify and recover the under 
> replicated ledgers and its corresponding entries. This would lead to losing 
> the successfully written entries, which will be a critical problem in 
> sensitive systems. This document is trying to describe few proposals to 
> overcome these limitations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to