[
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284505#comment-13284505
]
Uma Maheswara Rao G commented on BOOKKEEPER-237:
------------------------------------------------
For work assignment, how about competing for getting the replication work. We
already using this approach for Hbase for distributed log splitting. Idea is
like below,
Current distributed chain of watchers can identify the failure nodes and add at
some place in ZK. All bookies can watch on that node. Whenever new failure node
added, bookeies will get notification and they can start competing to get the
work. Winner will take the replication work. Also they can update the state of
the replication under that aquired lock node. If cluster restarts, Again
bookies can participate in competetion to get the Failed nodes replication
work. Whenever replication completes, they can delete the lock entry and failed
bookie entry from ZK. Infact, in Hbase we have master co-ordination. But here
we will be depending on distributed watching to identify filed bookies.
@Rakesh/Flavio how about your thoughts on this?
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
> Key: BOOKKEEPER-237
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
> Project: Bookkeeper
> Issue Type: New Feature
> Components: bookkeeper-client, bookkeeper-server
> Affects Versions: 4.0.0
> Reporter: Rakesh R
> Assignee: Rakesh R
> Attachments: Auto Recovery Detection - distributed chain
> approach.doc, Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server
> dies, there is no automatic mechanism to identify and recover the under
> replicated ledgers and its corresponding entries. This would lead to losing
> the successfully written entries, which will be a critical problem in
> sensitive systems. This document is trying to describe few proposals to
> overcome these limitations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira