[
https://issues.apache.org/jira/browse/BOOKKEEPER-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294534#comment-13294534
]
Rakesh R commented on BOOKKEEPER-272:
-------------------------------------
{quote}
I don't think we need the bookie
{quote}
Here I could see one race condition. Say first Auditor is coming to publish
failure of BK2 in L0001. Meantime BK4 has finished the re-replication of BK3's
L0001 and about to delete the entry from /underreplicated. In this case,
Auditor will silently continues by seeing L0001 and the other worker will
delete the L0001 entry thinking there is no more failures.
Solution I'm thinking to check the data version before doing zk
operation(similar logic we built in BKJM CurrentInProgress). I'm planning to
keep data as failed bookie information.
{quote}
As we need to run a check on the ledger to find which parts are underreplicated
(since some segments may not include the failed bookie), we may as well just
record the ledger id.Also, it'd be better to only have one worker fixing a
single ledger to avoid conflicting writes when updating the ledger metadata.
{quote}
Yeah, I understand. I'm having one suggestion, anyway auditor knows about the
failed bookies and its ledgers when publishing the underreplicated ledgers. Why
don't we keep the failed bookie as data inside the underreplicated ledger. So
the worker(segment checker) only looks to this bookie and get corresponding
index directly from the ZK ledger metadata?.
> Provide automatic mechanism to know bookie failures
> ---------------------------------------------------
>
> Key: BOOKKEEPER-272
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-272
> Project: Bookkeeper
> Issue Type: Sub-task
> Components: bookkeeper-server
> Reporter: Rakesh R
> Assignee: Rakesh R
> Attachments: BOOKKEEPER-272.1.patch, BOOKKEEPER-272.2.patch,
> BOOKKEEPER-272.Auditor.patch
>
>
> The idea is to build automatic mechanism to find out the bookie failures.
> Setup the bookie failure notifications to start the re-replication process.
> There are multiple approaches to findout bookie failures. Please refer the
> documents attached in BookKeeper-237.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira