[jira] [Commented] (HBASE-16388) Prevent client threads being blocked by only one slow region server

Mikhail Antonov (JIRA) Tue, 18 Oct 2016 16:24:07 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587002#comment-15587002
 ]


Mikhail Antonov commented on HBASE-16388:
-----------------------------------------

[~yangzhe1991]

I good one, I missed that. Is it fair to say that the primary motivation for 
that is that global, per-region and per-server limits in AP are flawed since 
they only ever enforced on the write path (going through AP#submit() / buffered 
mutator)?

With that being client-only change I'd consider backporting it to 1.3.. 
Anything that reduced blast radius from bad RS is an important reliability fix 
IMO.

> Prevent client threads being blocked by only one slow region server
> -------------------------------------------------------------------
>
>                 Key: HBASE-16388
>                 URL: https://issues.apache.org/jira/browse/HBASE-16388
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: HBASE-16388-branch-1-v1.patch, 
> HBASE-16388-branch-1-v2.patch, HBASE-16388-v1.patch, HBASE-16388-v2.patch, 
> HBASE-16388-v2.patch, HBASE-16388-v2.patch, HBASE-16388-v2.patch, 
> HBASE-16388-v3.patch
>
>
> It is a general use case for HBase's users that they have several 
> threads/handlers in their service, and each handler has its own Table/HTable 
> instance. Generally users think each handler is independent and won't 
> interact each other.
> However, in an extreme case, if a region server is very slow, every requests 
> to this RS will timeout, handlers of users' service may be occupied by the 
> long-waiting requests even requests belong to other RS will also be timeout.
> For example: 
> If we have 100 handlers in a client service(timeout is 1000ms) and HBase has 
> 10 region servers whose average response time is 50ms. If no region server is 
> slow, we can handle 2000 requests per second.
> Now this service's QPS is 1000. If there is one region server very slow and 
> all requests to it will be timeout. Users hope that only 10% requests failed, 
> and 90% requests' response time is still 50ms, because only 10% requests are 
> located to the slow RS. However, each second we have 100 long-waiting 
> requests which exactly occupies all 100 handles. So all handlers is blocked, 
> the availability of this service is almost zero.
> To prevent this case, we can limit the max concurrent requests to one RS in 
> process-level. Requests exceeding the limit will throws 
> ServerBusyException(extends DoNotRetryIOE) immediately to users. In the above 
> case, if we set this limit to 20, only 20 handlers will be occupied and other 
> 80 handlers can still handle requests to other RS. The availability of this 
> service is 90% as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16388) Prevent client threads being blocked by only one slow region server

Reply via email to