[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132854#comment-15132854
 ] 

ASF GitHub Bot commented on BOOKKEEPER-889:
-------------------------------------------

GitHub user sid825 opened a pull request:

    https://github.com/apache/bookkeeper/pull/11

    BOOKKEEPER-889: Bookie client does not try to use unhealthy bookies w…

    …hen forming ensembles

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sid825/bookkeeper BOOKKEEPER-889

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/bookkeeper/pull/11.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11
    
----
commit 48b56997109e3d12093196f8057d58cc18c08af3
Author: Siddharth Boobna <[email protected]>
Date:   2016-02-04T18:28:37Z

    BOOKKEEPER-889: Bookie client does not try to use unhealthy bookies when 
forming ensembles
    
    Change-Id: Ic875d93a64c05393f410924c53c0ca0dd467c715

----


> BookKeeper client should try not to use bookies with errors/timeouts when 
> forming a new ensemble
> ------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-889
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-889
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-client
>    Affects Versions: 4.3.2
>            Reporter: Siddharth Sunil Boobna
>            Assignee: Siddharth Sunil Boobna
>             Fix For: 4.4.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Due to various issues (slow disks, network issues, bugs, etc), the bookkeeper 
> can be slow or unresponsive for extended period of times. During this time, 
> r/w operations will fail/timeout and ledgers will create a new segment and 
> form a new ensemble replacing this bookie. For new ledgers, it might still 
> pick up this bookie or we can replace this bookie with another faulty bookie 
> if we have multiple faulty bookies. 
> The BK client should keep stats about these failure rates for all the bookies 
> and it should "quarantine" failing bookies for a certain amount of time. Once 
> a bookie is quarantined, it will not be picked up in forming a new ensemble, 
> unless no other "healthy" bookies are available.
> Solution:
> Keep a counter of errors in the bookie client pool and periodically check for 
> number of errors in a given time span and mark these bookies as "quarantined" 
> in the BookieWatcher.
> In the BookieWatcher, try to create an ensemble list excluding the 
> quarantined bookies and if that fails, fall back to an empty exclusion list.
> We will also remove the bookies from the quarantined list after a 
> configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to