[
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281048#comment-13281048
]
Flavio Junqueira commented on BOOKKEEPER-237:
---------------------------------------------
Hi Rakesh, I've had a look the document you uploaded. I like the approach in
general, and I'd like to ask you for some clarifications:
# Just to confirm, elements in the myId list have to be deleted manually, yes?
If a node is decommissioned, then I suppose we will want to delete from the
list.
# In step 2 of the monitor (managing the chain), it says that the auditor
notifies some other bookie that it needs to handle re-replication. How exactly
does this notification happen? Bookies currently don't talk to each other
directly. We would need to do this communication through zookeeper if we want
to keep bookies decoupled.
# In the description of replicators, it says that nodes will compete for
re-replication entries of a ledger. I like this approach because a bookie may
refrain from bidding in the case it is overwhelmed. I couldn't understand
though how the lock is created. The description says L00001_ip:port, but it is
not clear if ip:port corresponds to the lock holder, in which case the lock
znode wouldn't be unique.
Also, this proposal is similar to what I discussed with Roger offline. The
general idea that Roger proposed was to separate assignment of work from
actually doing the work. Assigning the work is not a heavy task so it is ok to
be done by a single process.
Roger, do you have anything to add?
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
> Key: BOOKKEEPER-237
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
> Project: Bookkeeper
> Issue Type: New Feature
> Components: bookkeeper-client, bookkeeper-server
> Affects Versions: 4.0.0
> Reporter: Rakesh R
> Assignee: Rakesh R
> Attachments: Auto Recovery Detection - distributed chain
> approach.doc, Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server
> dies, there is no automatic mechanism to identify and recover the under
> replicated ledgers and its corresponding entries. This would lead to losing
> the successfully written entries, which will be a critical problem in
> sensitive systems. This document is trying to describe few proposals to
> overcome these limitations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira