[
https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293634#comment-13293634
]
Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------
We are thinking about the sequence till now like below:
1.Bookie fails
2.Auditor puts list of affected ledgers in suspected/underreplicated ledgers
znode
3.Replication worker will take one by one ledger from suspected ledgers znode
and re-replicate it.
If we are able reuse the BookKeeperAdmin code to re-replicate, then
BookKeeperAdmin #recoverLedger already finding the fragments and replicating
then and there. Am I missing some thing here?
Otherwise Recovery worker/Replication worker may need to watch two level of
data. 1. suspected ledgers znode 2. underreplicated znode.
{quote}
Also, i think bookies should run this detection on all their ledgers, every
few hours, to detect disk issues
{quote}
I agree. I think work can be triggered on disk failures and will run hourly
basis by default.
> Detection of under replication
> ------------------------------
>
> Key: BOOKKEEPER-247
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
> Project: Bookkeeper
> Issue Type: Sub-task
> Components: bookkeeper-client, bookkeeper-server
> Reporter: Ivan Kelly
> Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of
> ledger entries.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira