[
https://issues.apache.org/jira/browse/BOOKKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961853#comment-13961853
]
Rakesh R commented on BOOKKEEPER-742:
-------------------------------------
For future reference, I'm pasting the link to the mail discussions we had
earlier : [Problem in rereplication
algorithm|http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201403.mbox/%[email protected]%3e]
\\
Thanks [~ikelly] for the patch, overall it looks fine. I've few clarifications:
*1)* bq.It's the same case. But we have an autorecovery system that should stop
us from ever getting to that state, because we should deal with the case where
the loss of 1 more bookie would cause data to possibly be lost.
So in an ideal cluster its not expected to see multiple failures at sametime.
If multiple failure happens, again it would enter into the loop as opening
ledger wouldn't get succeeded.
{code}
+ if (!isFinalEnsembleOpenAndAvailable(lh)) {
+ lh = admin.openLedger(ledgerId);
+ }
{code}
\\
*2)* I failed to see reason why the following sync block is removed in the
patch ?
{code}
void submitAuditTask() {
- synchronized (this)
{code}
\\
Also, [~ikelly] it would be great if you can have a look at BOOKEEPER-733, this
is another case where I noticed unnecessary cycles. During that time, I thought
of putting together all these cases (also considering unknown cases if any in
future) as a whole and think of a generic way of handling by having return
codes. Now based on the return code it can build an idea of BACKOFF or skipping
that ledger in further cycles.
> Fix for empty ledgers losing quorum.
> ------------------------------------
>
> Key: BOOKKEEPER-742
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-742
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-auto-recovery
> Reporter: Ivan Kelly
> Assignee: Ivan Kelly
> Fix For: 4.3.0, 4.2.3
>
> Attachments: 0001-Fix-for-empty-ledgers-using-quorum.trunk.patch,
> 0003-Fix-for-empty-ledgers-using-quorum.branch4.2.patch
>
>
> If a ledger is open and empty, when a bookie in the ensemble crashes no
> recovery will take place (because there's nothing to recover). This open
> empty unrepaired ledger can persist for a long time. If it loses another
> bookie, it can lose quorum. At this point it's impossible for the bookie to
> know that its an empty ledger, and the admin gets notified of missing data.
--
This message was sent by Atlassian JIRA
(v6.2#6252)