[
https://issues.apache.org/jira/browse/BOOKKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961708#comment-13961708
]
Ivan Kelly commented on BOOKKEEPER-742:
---------------------------------------
{quote}
The problem here is that you've lost too many bookies, so it is possible that
you've lost data and consequently the ledger is bad. Say that entry has been
successfully written and acknowledged. In this case, we can't recover 1 and the
ledger is bad, but we can't distinguish the case you're describing from the one
I just presented.
{quote}
It's the same case. But we have an autorecovery system that should stop us from
ever getting to that state, because we should deal with the case where the loss
of 1 more bookie would cause data to possibly be lost.
{quote}
Also, for completeness, I just wanted to confirm that if 3 or more entries have
been written, then we would be able to spot that the ledger is really bad
because we would be able to see entry 2, but not 1.
{quote}
In this case, even if 2 entries have been written the current code would handle
it because it was see that E2 exists on the second bookie, but not the third.
{quote}
One small point about the patch. This method isFinalEnsembleOpenAndAvailable,
which seems to be key to this patch, returns true in the case the ledger
metadata says closed. Returning true in this case is a bit misleading, and
perhaps we could rename the method to something like shouldCloseLedger and
negate the return values.
{quote}
shouldCloseLedger is too generic, I want to make it clear in the code that this
is a special case. I can rename, but it needs to be clear that the method is
checking for something particular.
> Fix for empty ledgers losing quorum.
> ------------------------------------
>
> Key: BOOKKEEPER-742
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-742
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-auto-recovery
> Reporter: Ivan Kelly
> Assignee: Ivan Kelly
> Fix For: 4.3.0, 4.2.3
>
> Attachments: 0001-Fix-for-empty-ledgers-using-quorum.trunk.patch,
> 0003-Fix-for-empty-ledgers-using-quorum.branch4.2.patch
>
>
> If a ledger is open and empty, when a bookie in the ensemble crashes no
> recovery will take place (because there's nothing to recover). This open
> empty unrepaired ledger can persist for a long time. If it loses another
> bookie, it can lose quorum. At this point it's impossible for the bookie to
> know that its an empty ledger, and the admin gets notified of missing data.
--
This message was sent by Atlassian JIRA
(v6.2#6252)