[
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271457#comment-13271457
]
Rakesh R commented on BOOKKEEPER-237:
-------------------------------------
bq. If I understand your scheme correctly, then it is not exactly centralized.
An accountant could be any bookie and all bookies would bid for accountantship.
It does put the burden of the accountant on a single machine at a time, and I
wonder if we can spread the responsibility across the available machines to
balance load.
Here, Accountant is light weight and internally one daemon inside the elected
Bookie. It would use ZK watchers for knowing Bookie failures and timeouts from
clients. (like how the ZK Leader will do). Also I feel, the level of
concurrency would get reduced.
{quote}
On a side note, I can't recall right now, but I think the accountant is
stateless, correct?
{quote}
Yes, Accountant is stateless, when it identifies any under replicated ledgers,
he will put into corresponding ZK node and watchers inturn give rereplica
notification to peer Bookies. Also, able to withstand Accountant failures and
re-election.
bq.Bookies can have multiple pointers and watch multiple nodes.
Here, who will be creating groups and also needs to consider the group
reformation on failures.
Also, should design multiple groups and pointers to withstand multipe crashes.
Instead can we make it simple by choosing one guy for monitoring?
bq.Some applications might not want to have entries spread across multiple
bookies. They could for example turn off striping and prefer not to create
another ledger instead of having an ensemble change.
If I understand correctly, you are suggesting to provide turn off striping and
prefer to create another ledger instead of having an ensemble change. Still
recovery logic should consider ensemble reformation.
Why I am thinking to avoid ensemble reformation for each bookie down,
# When a slow replica goes down, if client reforms the ensemble, from which
entry the new ensemble will be formed?
# When a bookie goes down, all the ledgers in that Bookie can be assigned to
another Bookie if no reformation is allowed as the unit of replication. Otw I
should go one more level down and parse each ensemble level within a ledger and
has to be considered as the unit of replication. Also, the
tracking(rereplication) needs to be at that level?
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
> Key: BOOKKEEPER-237
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
> Project: Bookkeeper
> Issue Type: New Feature
> Components: bookkeeper-client, bookkeeper-server
> Affects Versions: 4.0.0
> Reporter: Rakesh R
> Assignee: Rakesh R
> Attachments: Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server
> dies, there is no automatic mechanism to identify and recover the under
> replicated ledgers and its corresponding entries. This would lead to losing
> the successfully written entries, which will be a critical problem in
> sensitive systems. This document is trying to describe few proposals to
> overcome these limitations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira