[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284884#comment-13284884
 ] 

Rakesh R commented on BOOKKEEPER-272:
-------------------------------------

Hi Flavio,

Thanks for your interest and comments, actually I was doing prototype for the 
'CircularChain' algorithm without a central auditor guy. Sorry for the late 
reply:)

But I faced a problem in handling the following situation.

1 <- 2 <- 3 <- 4 <- 5 <- 6 <- 1

Consider the scenario where 2,3,4 went down. Now 5 got the notification and 
marked 4 as failed and moves to 4's neighbour 3. Whenever 5 is checking about 
3's status, say 3 rejoins and is alive, this inturn will stop searching. Say, 
immediately 3 also went down. Here the chance of missing 2's failure is high. 

One solution that comes in my mind is, when 5 identifies 3 is alive he will add 
watcher to 3. Here again another problem is, consider 4 has rejoined and will 
also try adding watcher to 3. Now if we analyse the chain, the previous watcher 
added by 5 also will be there to 3(as ZK doesn't has unregister of watcher). 
Also, the level of watcher reformation will gradually increases.

I'm bit worrying about the chances of missing watchers and isolation/race 
conditions with this approach. 

If I'm having an auditor, only he will look to all and inform about failure 
Bookies. I feel, there is no chance of missing watchers and isolation/race 
conditions in this approach. Only the overhead will be Auditor election. 

He(central node) will publish about the failed bookie(through znode) and after 
recieving the notification anyone can acquire the lock and started 
re-replication and cycle will continue till complete re-replication. 

I'd like to know your opinion on handling bookie failures through central 
entity?.

Thanks,
Rakesh
                
> Introduce chain of bookies for distributing the re-replication
> --------------------------------------------------------------
>
>                 Key: BOOKKEEPER-272
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-272
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-server
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>
> The idea is to form logical chain and each Bookie will be taking care each 
> other. On any Bookie failure, his neighbour will act and initiate 
> re-replication.
> For example:
> Assume we have 1,2,3,4,5,6 bookies and will be forming the following chain
> 01 <- 02 <- 03 <- 04 <- 05 <- 06 <- 01
> Here, each one should take care of my immediate predecessor node. The lowest 
> node should always care the highest node and is forming the logical closed 
> chain.
> Reference docs attached in BookKeeper-237.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to