[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293592#comment-13293592
 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

A bookie failure, is really the failure of a lot of ledger fragments. I think 
the direction of BOOKKEEPER-272 matches that. The sequence of events for a 
bookie failure is.

# Bookie fails
# Auditor puts list of affected ledgers in suspected ledgers znode
# Recovery worker takes a ledger from the list, and runs this detection on it. 
Puts underreplicated ledger fragments in underreplicated znode.
# Recovery worker takes an underreplicated ledger fragment, and rereplicates it.

Each bookie is running a recovery worker, so the work of detection and 
rereplication will be distributed, while the auditor for checking the bookies 
will be centralized. Also, i think bookies should run this detection on all 
their ledgers, every few hours, to detect disk issues.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of 
> ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to