[
https://issues.apache.org/jira/browse/BOOKKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961141#comment-13961141
]
Flavio Junqueira commented on BOOKKEEPER-742:
---------------------------------------------
I remember this discussion on the list, but I can't remember the conclusion.
This is a comment in the patch:
{noformat}
* For example, if in a E3Q2, only 1 entry is written and the last bookie
* in the ensemble fails, nothing has been written to it, so nothing needs
to be
* recovered. But if the second to last bookie fails, we've now lost quorum
for
* the second entry, so it's impossible to see if the second has been
written or
* not.
{noformat}
The problem here is that you've lost too many bookies, so it is possible that
you've lost data and consequently the ledger is bad. Say that entry has been
successfully written and acknowledged. In this case, we can't recover 1 and the
ledger is bad, but we can't distinguish the case you're describing from the one
I just presented.
Also, for completeness, I just wanted to confirm that if 3 or more entries have
been written, then we would be able to spot that the ledger is really bad
because we would be able to see entry 2, but not 1.
One small point about the patch. This method isFinalEnsembleOpenAndAvailable,
which seems to be key to this patch, returns true in the case the ledger
metadata says closed. Returning true in this case is a bit misleading, and
perhaps we could rename the method to something like shouldCloseLedger and
negate the return values.
> Fix for empty ledgers losing quorum.
> ------------------------------------
>
> Key: BOOKKEEPER-742
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-742
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-auto-recovery
> Reporter: Ivan Kelly
> Assignee: Ivan Kelly
> Fix For: 4.3.0, 4.2.3
>
> Attachments: 0001-Fix-for-empty-ledgers-using-quorum.trunk.patch,
> 0003-Fix-for-empty-ledgers-using-quorum.branch4.2.patch
>
>
> If a ledger is open and empty, when a bookie in the ensemble crashes no
> recovery will take place (because there's nothing to recover). This open
> empty unrepaired ledger can persist for a long time. If it loses another
> bookie, it can lose quorum. At this point it's impossible for the bookie to
> know that its an empty ledger, and the admin gets notified of missing data.
--
This message was sent by Atlassian JIRA
(v6.2#6252)