[
https://issues.apache.org/jira/browse/HBASE-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490850#comment-15490850
]
stack commented on HBASE-16388:
-------------------------------
Findbugs issue is unrelated:
Code Warning
RV Return value of java.util.concurrent.CountDownLatch.await(long,
TimeUnit) ignored in
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper()
Looks like attempt at fixing this elsewhere did not work.
I tried the tests locally and they passed. Need to dig in on these tests on why
they are failing. They seem unrelated to this patch. Going to commit.
> Prevent client threads being blocked by only one slow region server
> -------------------------------------------------------------------
>
> Key: HBASE-16388
> URL: https://issues.apache.org/jira/browse/HBASE-16388
> Project: HBase
> Issue Type: New Feature
> Reporter: Phil Yang
> Assignee: Phil Yang
> Attachments: HBASE-16388-branch-1-v1.patch,
> HBASE-16388-branch-1-v2.patch, HBASE-16388-v1.patch, HBASE-16388-v2.patch,
> HBASE-16388-v2.patch, HBASE-16388-v2.patch, HBASE-16388-v2.patch,
> HBASE-16388-v3.patch
>
>
> It is a general use case for HBase's users that they have several
> threads/handlers in their service, and each handler has its own Table/HTable
> instance. Generally users think each handler is independent and won't
> interact each other.
> However, in an extreme case, if a region server is very slow, every requests
> to this RS will timeout, handlers of users' service may be occupied by the
> long-waiting requests even requests belong to other RS will also be timeout.
> For example:
> If we have 100 handlers in a client service(timeout is 1000ms) and HBase has
> 10 region servers whose average response time is 50ms. If no region server is
> slow, we can handle 2000 requests per second.
> Now this service's QPS is 1000. If there is one region server very slow and
> all requests to it will be timeout. Users hope that only 10% requests failed,
> and 90% requests' response time is still 50ms, because only 10% requests are
> located to the slow RS. However, each second we have 100 long-waiting
> requests which exactly occupies all 100 handles. So all handlers is blocked,
> the availability of this service is almost zero.
> To prevent this case, we can limit the max concurrent requests to one RS in
> process-level. Requests exceeding the limit will throws
> ServerBusyException(extends DoNotRetryIOE) immediately to users. In the above
> case, if we set this limit to 20, only 20 handlers will be occupied and other
> 80 handlers can still handle requests to other RS. The availability of this
> service is 90% as expected.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)