[
https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293592#comment-13293592
]
Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------
A bookie failure, is really the failure of a lot of ledger fragments. I think
the direction of BOOKKEEPER-272 matches that. The sequence of events for a
bookie failure is.
# Bookie fails
# Auditor puts list of affected ledgers in suspected ledgers znode
# Recovery worker takes a ledger from the list, and runs this detection on it.
Puts underreplicated ledger fragments in underreplicated znode.
# Recovery worker takes an underreplicated ledger fragment, and rereplicates it.
Each bookie is running a recovery worker, so the work of detection and
rereplication will be distributed, while the auditor for checking the bookies
will be centralized. Also, i think bookies should run this detection on all
their ledgers, every few hours, to detect disk issues.
> Detection of under replication
> ------------------------------
>
> Key: BOOKKEEPER-247
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
> Project: Bookkeeper
> Issue Type: Sub-task
> Components: bookkeeper-client, bookkeeper-server
> Reporter: Ivan Kelly
> Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of
> ledger entries.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira