[
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271202#comment-13271202
]
Flavio Junqueira commented on BOOKKEEPER-237:
---------------------------------------------
{quote}
I feel, should consider all the corner cases since WALs are too costly. Also we
would be able to showcase BK as an efficient WAL tool.
{quote}
Agreed, I'm not saying we should discard such cases, I'm just saying that I
don't expect these to be the regular case. If you buy that these don't
constitute the regular case, then we may not want to focus on such corner cases
when it comes to optimize for performance. More concretely, if typically for a
ledger, there is one ensemble change or none, we really just need to copy the
entries of the faulty bookie. We do need to make sure that network fluctuations
do not cause system instability, though.
{quote}
Only the exceptional case is, say a Bookie has few ledgers which are
successfully written and unfortunately the current ledger writing is getting
timedout. The client would reform the ensemble and continue writing.
Here, only this ledger to be considered as under replicated as it may endup
with partial entries and not in the Bookie level?
{quote}
Writes that haven't been acknowledged are errored out and sent to the bookie
replacement in the new ensemble, they don't get under replicated.
{quote}
Here, I have few concerns:
As you pointed out needs to consider multiple crashes?
Assume Bookie chain : BK1->BK2->BK3->BK4->BK5. Say, BK2 & BK3 dies. BK4
doesn't knows about BK2. It would be even more painful, if many consecutive
failures.
Say current ledger writing is getting timedout as mentioned above?
Here, consider a case where intermittent n/w fluctuations.
Watcher Bookie might be replica holder of that ledger.
Assume Bookie chain : BK1->BK2->BK3->BK4->BK5. Say BK2 failed, BK3 would
not be able to replicate the content as it may be an existing replica holder.
{quote}
There are two ways I see to improve the naive scheme I proposed:
# Each bookie can keep a snapshot of the bookie list at the time it decided
which other bookie to watch. This way once a bookie receives a crash
notification, it can verify which bookies are gone and replicate accordingly.
In your first bullet, BK1 knows that it has to replicate both BK2 and BK3.
# Bookies can have multiple pointers and watch multiple nodes. For example, BK4
could watch both BK3 and BK5.
{quote}
I feel, under replica detection should be centralized.
{quote}
If I understand your scheme correctly, then it is not exactly centralized. An
accountant could be any bookie and all bookies would bid for accountantship.
With ZK leader election, you guarantee that only one takes over the role at a
time. It does put the burden of the accountant on a single machine at a time,
and I wonder if we can spread the responsibility across the available machines
to balance load.
On a side note, I can't recall right now, but I think the accountant is
stateless, correct?
{quote}
I would like to know more on this. IMHO, avoid the reformation within a ledger
and throws specific exception back to the client, so that he would close the
ledger and creates a new one. Still client would be able get the ensemble
reformation/dynamic bookies on ledger level. My idea is to simplify the ledger
parsing for detecting under replica ledger entries and identifying target
replica Bookies.
{quote}
It would be nice to have a mechanism to inform the application of changes to
the system state, like ensemble changes. Right now we rely on error codes of
operations, and in some cases, like ensemble changes, it is transparent.
Some applications might not want to have entries spread across multiple
bookies. They could for example turn off striping and prefer not to create
another ledger instead of having an ensemble change.
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
> Key: BOOKKEEPER-237
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
> Project: Bookkeeper
> Issue Type: New Feature
> Components: bookkeeper-client, bookkeeper-server
> Affects Versions: 4.0.0
> Reporter: Rakesh R
> Assignee: Rakesh R
> Attachments: Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server
> dies, there is no automatic mechanism to identify and recover the under
> replicated ledgers and its corresponding entries. This would lead to losing
> the successfully written entries, which will be a critical problem in
> sensitive systems. This document is trying to describe few proposals to
> overcome these limitations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira