[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961708#comment-13961708
 ] 

Ivan Kelly commented on BOOKKEEPER-742:
---------------------------------------

{quote}
The problem here is that you've lost too many bookies, so it is possible that 
you've lost data and consequently the ledger is bad. Say that entry has been 
successfully written and acknowledged. In this case, we can't recover 1 and the 
ledger is bad, but we can't distinguish the case you're describing from the one 
I just presented.
{quote}
It's the same case. But we have an autorecovery system that should stop us from 
ever getting to that state, because we should deal with the case where the loss 
of 1 more bookie would cause data to possibly be lost.

{quote}
Also, for completeness, I just wanted to confirm that if 3 or more entries have 
been written, then we would be able to spot that the ledger is really bad 
because we would be able to see entry 2, but not 1. 
{quote}
In this case, even if 2 entries have been written the current code would handle 
it because it was see that E2 exists on the second bookie, but not the third.

{quote}
One small point about the patch. This method isFinalEnsembleOpenAndAvailable, 
which seems to be key to this patch, returns true in the case the ledger 
metadata says closed. Returning true in this case is a bit misleading, and 
perhaps we could rename the method to something like shouldCloseLedger and 
negate the return values. 
{quote}
shouldCloseLedger is too generic, I want to make it clear in the code that this 
is a special case. I can rename, but it needs to be clear that the method is 
checking for something particular.

> Fix for empty ledgers losing quorum.
> ------------------------------------
>
>                 Key: BOOKKEEPER-742
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-742
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-auto-recovery
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.3.0, 4.2.3
>
>         Attachments: 0001-Fix-for-empty-ledgers-using-quorum.trunk.patch, 
> 0003-Fix-for-empty-ledgers-using-quorum.branch4.2.patch
>
>
> If a ledger is open and empty, when a bookie in the ensemble crashes no 
> recovery will take place (because there's nothing to recover). This open 
> empty unrepaired ledger can persist for a long time. If it loses another 
> bookie, it can lose quorum. At this point it's impossible for the bookie to 
> know that its an empty ledger, and the admin gets notified of missing data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to