[
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273341#comment-13273341
]
Flavio Junqueira commented on BOOKKEEPER-237:
---------------------------------------------
I'm assuming that the discussion about whether we should have a single
accountant entity still belongs in this jira, so I'll keep it here.
bq. Here, Accountant is light weight and internally one daemon inside the
elected Bookie. It would use ZK watchers for knowing Bookie failures and
timeouts from clients. (like how the ZK Leader will do). Also I feel, the level
of concurrency would get reduced.
I'm getting to realize that the main difference between what you're proposing
and my half-baked proposal is that I'm trying to get rid of master accountant
election and have each bookie individually figuring out what it has to
replicate in the case of a crash. I believe that's the key difference.
bq. Here, who will be creating groups and also needs to consider the group
reformation on failures.
Also, should design multiple groups and pointers to withstand multiple crashes.
Instead can we make it simple by choosing one guy for monitoring?
My proposal is based on the recipe we have proposed and used to avoid the herd
effect with zookeeper leader election. A naive way to do leader election with
zk is to have everyone watching the leader znode. If the leader crashes, then
everyone receives a notification, which is unnecessary in some cases.
An alternative way is the following. When a client bids for leadership, it
creates an ephemeral and sequential znode. To decide which znode to watch, a
node gets the list of ephemerals and watches the one immediately before
according to the sequence numbers. In this setting, upon a crash only one
notification is generated.
Here we can use a similar approach, each bookie watches say the predecessor and
the successor, and rebuilds the links upon receiving notifications. I'm
proposing predecessors and successors but in reality we can create links in any
way you want. The important observation is that we can do it in a distributed
manner.
bq. If I understand correctly, you are suggesting to provide turn off striping
and prefer to create another ledger instead of having an ensemble change. Still
recovery logic should consider ensemble reformation.
My observation is that some applications might prefer not to have automatic
ensemble healing. I was not proposing to remove the current scheme, not even
change the default. I was just considering another option.
One alternative that has been proposed and I found interesting is the one of
notifying the application of exceptional events, like ensemble changes. Such a
mechanism can also give the application the opportunity of closing the ledger
if it chooses to.
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
> Key: BOOKKEEPER-237
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
> Project: Bookkeeper
> Issue Type: New Feature
> Components: bookkeeper-client, bookkeeper-server
> Affects Versions: 4.0.0
> Reporter: Rakesh R
> Assignee: Rakesh R
> Attachments: Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server
> dies, there is no automatic mechanism to identify and recover the under
> replicated ledgers and its corresponding entries. This would lead to losing
> the successfully written entries, which will be a critical problem in
> sensitive systems. This document is trying to describe few proposals to
> overcome these limitations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira