[
https://issues.apache.org/jira/browse/BOOKKEEPER-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132854#comment-15132854
]
ASF GitHub Bot commented on BOOKKEEPER-889:
-------------------------------------------
GitHub user sid825 opened a pull request:
https://github.com/apache/bookkeeper/pull/11
BOOKKEEPER-889: Bookie client does not try to use unhealthy bookies w…
…hen forming ensembles
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sid825/bookkeeper BOOKKEEPER-889
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/bookkeeper/pull/11.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11
----
commit 48b56997109e3d12093196f8057d58cc18c08af3
Author: Siddharth Boobna <[email protected]>
Date: 2016-02-04T18:28:37Z
BOOKKEEPER-889: Bookie client does not try to use unhealthy bookies when
forming ensembles
Change-Id: Ic875d93a64c05393f410924c53c0ca0dd467c715
----
> BookKeeper client should try not to use bookies with errors/timeouts when
> forming a new ensemble
> ------------------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-889
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-889
> Project: Bookkeeper
> Issue Type: Improvement
> Components: bookkeeper-client
> Affects Versions: 4.3.2
> Reporter: Siddharth Sunil Boobna
> Assignee: Siddharth Sunil Boobna
> Fix For: 4.4.0
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Due to various issues (slow disks, network issues, bugs, etc), the bookkeeper
> can be slow or unresponsive for extended period of times. During this time,
> r/w operations will fail/timeout and ledgers will create a new segment and
> form a new ensemble replacing this bookie. For new ledgers, it might still
> pick up this bookie or we can replace this bookie with another faulty bookie
> if we have multiple faulty bookies.
> The BK client should keep stats about these failure rates for all the bookies
> and it should "quarantine" failing bookies for a certain amount of time. Once
> a bookie is quarantined, it will not be picked up in forming a new ensemble,
> unless no other "healthy" bookies are available.
> Solution:
> Keep a counter of errors in the bookie client pool and periodically check for
> number of errors in a given time span and mark these bookies as "quarantined"
> in the BookieWatcher.
> In the BookieWatcher, try to create an ensemble list excluding the
> quarantined bookies and if that fails, fall back to an empty exclusion list.
> We will also remove the bookies from the quarantined list after a
> configurable period of time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)