[jira] [Commented] (BOOKKEEPER-237) Automatic recovery of under-replicated ledgers and its entries

Ivan Kelly (JIRA) Tue, 08 May 2012 07:40:36 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270492#comment-13270492
 ]


Ivan Kelly commented on BOOKKEEPER-237:
---------------------------------------

{quote}
I have gone through BookKeeper-208, 'write quorum' is using the RRDS algo for 
interleaving ? 
{quote}
Yes, but if write quorum size is the same as ensemble size, the message will be 
sent to all

{quote}
Also will reform the ensemble if any Bookie in this ensemble is slow/dead and 
timedout ?
{quote}
In the case of a slow/dead bookie, the write can continue writing without 
issue. When the connection to the slow/dead bookie times out, the client will 
try to replace the bookie. 

{quote}
But, I was thinking to avoid the interleaving of entries and ensemble 
reformation on write entry failures to make recovery simple. Here the idea is, 
all the ensemble Bookies will be having the replicas in best case. Ack quorum 
also set to the ensemble size. Here, assured replica = quorum size.
Please see the doc section '1.4.4 Example: How it works'

For any ledger,
in best case, total replicas = ensemble size
in worst case, total replicas = quorum size, which is the assured/majority 
replicas.
{quote}
This will behave the same if write quorum is the same as ensemble size, and ack 
quorum is the same as "assured replicas".

{quote}
Also, I feel able to handle intermittent network problems as by default this 
approach will tolerate (ensemble - quorum) failures.
{quote}
For intermittent network problems, this can be handled by setting the socket 
timeout to a large value. 

{quote}
How client will read?
For an inprogress ledger, bkclient looks to the entire ensemble for reading.
{quote}
For a non closed ledger, a read is sent once to all bookies in the ensemble to 
get the lastEntryConfirmed. Once we have this we read as if we are reading from 
a closed ledger in which lastEntryConfirmed is the last entry.

{quote}
For a closed ledger, bkclient looks to the CLOSED ensemble for reading.
{quote}
For a closed ledger it follows a roundrobin reading pattern as it does today. 
For each entry, it tries to read from a single bookie. If the read fails, it 
tries the next bookie in the roundrobin.


{quote}
I was trying to avoid multiple ZK calls to read ledger metadata for the 
detection. Only for the failed ledgers, Accountant would read the ledger 
metadata and initiate the rereplication.
{quote}
ZK is a read optimised system, so I think doing multiple reads on it is more 
effecient than maintaining another list. 

{quote}
Also, the map contains the _inprogressreplica information, so the new 
Accountant would be knowing about the already initiated rereplicas.

...

Yeah, its good. If I understand correctly, instead of assigning work to the 
Bookie, Accountant would keep the under replicated ledgers. Bookies (which is 
not an existing replica holder for that ledger) would takes the unreplicated 
ledger and after finish send ack to the Accountant. I feel, we should define 
proper locking here to avoid concurrency?
{quote}
Each rereplicating ledger can have a lock on it. When a bookie goes to 
rereplicate a ledger, it writes a ephemereal sequential znode as a child of the 
rereplicated ledger znode. Then it checks if it has the lowest sequence number. 
If not, it deletes the znode (as this indicates someone else is rereplicating 
that particular ledger). If it has the lowest sequence number, it has just 
acquired the lock. Then it checks what rereplication needs to happen on the 
ledger, it rereplicates, and finally it deletes its lock and then the ledger 
rereplication znode. I think this removes the need for _inprogress znodes also.
                
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
>                 Key: BOOKKEEPER-237
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
>             Project: Bookkeeper
>          Issue Type: New Feature
>          Components: bookkeeper-client, bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server 
> dies, there is no automatic mechanism to identify and recover the under 
> replicated ledgers and its corresponding entries. This would lead to losing 
> the successfully written entries, which will be a critical problem in 
> sensitive systems. This document is trying to describe few proposals to 
> overcome these limitations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-237) Automatic recovery of under-replicated ledgers and its entries

Reply via email to