[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961853#comment-13961853
 ] 

Rakesh R commented on BOOKKEEPER-742:
-------------------------------------

For future reference, I'm pasting the link to the mail discussions we had 
earlier : [Problem in rereplication 
algorithm|http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201403.mbox/%[email protected]%3e]

\\
Thanks [~ikelly] for the patch, overall it looks fine. I've few clarifications:

*1)* bq.It's the same case. But we have an autorecovery system that should stop 
us from ever getting to that state, because we should deal with the case where 
the loss of 1 more bookie would cause data to possibly be lost.
So in an ideal cluster its not expected to see multiple failures at sametime. 
If multiple failure happens, again it would enter into the loop as opening 
ledger wouldn't get succeeded.

{code}
+                    if (!isFinalEnsembleOpenAndAvailable(lh)) {
+                        lh = admin.openLedger(ledgerId);
+                    }
{code}
\\
*2)* I failed to see reason why the following sync block is removed in the 
patch ?
{code}
void submitAuditTask() {
-        synchronized (this)
{code}

\\
Also, [~ikelly] it would be great if you can have a look at BOOKEEPER-733, this 
is another case where I noticed unnecessary cycles. During that time, I thought 
of putting together all these cases (also considering unknown cases if any in 
future) as a whole and think of a generic way of handling by having return 
codes. Now based on the return code it can build an idea of BACKOFF or skipping 
that ledger in further cycles.

> Fix for empty ledgers losing quorum.
> ------------------------------------
>
>                 Key: BOOKKEEPER-742
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-742
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-auto-recovery
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.3.0, 4.2.3
>
>         Attachments: 0001-Fix-for-empty-ledgers-using-quorum.trunk.patch, 
> 0003-Fix-for-empty-ledgers-using-quorum.branch4.2.patch
>
>
> If a ledger is open and empty, when a bookie in the ensemble crashes no 
> recovery will take place (because there's nothing to recover). This open 
> empty unrepaired ledger can persist for a long time. If it loses another 
> bookie, it can lose quorum. At this point it's impossible for the bookie to 
> know that its an empty ledger, and the admin gets notified of missing data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to