[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961141#comment-13961141
 ] 

Flavio Junqueira commented on BOOKKEEPER-742:
---------------------------------------------

I remember this discussion on the list, but I can't remember the conclusion. 
This is a comment in the patch:

{noformat}
     * For example, if in a E3Q2, only 1 entry is written and the last bookie
     * in the ensemble fails, nothing has been written to it, so nothing needs 
to be
     * recovered. But if the second to last bookie fails, we've now lost quorum 
for
     * the second entry, so it's impossible to see if the second has been 
written or
     * not.
{noformat}

The problem here is that you've lost too many bookies, so it is possible that 
you've lost data and consequently the ledger is bad. Say that entry has been 
successfully written and acknowledged. In this case, we can't recover 1 and the 
ledger is bad, but we can't distinguish the case you're describing from the one 
I just presented.

Also, for completeness, I just wanted to confirm that if 3 or more entries have 
been written, then we would be able to spot that the ledger is really bad 
because we would be able to see entry 2, but not 1. 

One small point about the patch. This method isFinalEnsembleOpenAndAvailable, 
which seems to be key to this patch, returns true in the case the ledger 
metadata says closed. Returning true in this case is a bit misleading, and 
perhaps we could rename the method to something like shouldCloseLedger and 
negate the return values.  


> Fix for empty ledgers losing quorum.
> ------------------------------------
>
>                 Key: BOOKKEEPER-742
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-742
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-auto-recovery
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.3.0, 4.2.3
>
>         Attachments: 0001-Fix-for-empty-ledgers-using-quorum.trunk.patch, 
> 0003-Fix-for-empty-ledgers-using-quorum.branch4.2.patch
>
>
> If a ledger is open and empty, when a bookie in the ensemble crashes no 
> recovery will take place (because there's nothing to recover). This open 
> empty unrepaired ledger can persist for a long time. If it loses another 
> bookie, it can lose quorum. At this point it's impossible for the bookie to 
> know that its an empty ledger, and the admin gets notified of missing data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to