Phil Yang created HBASE-16388:
---------------------------------

             Summary: Prevent client threads being blocked by only one slow 
region server
                 Key: HBASE-16388
                 URL: https://issues.apache.org/jira/browse/HBASE-16388
             Project: HBase
          Issue Type: New Feature
            Reporter: Phil Yang
            Assignee: Phil Yang


It is a general use case for HBase's users that they have several 
threads/handlers in their service, and each handler has its own Table/HTable 
instance. Generally users think each handler is independent and won't interact 
each other.

However, in an extreme case, if a region server is very slow, every requests to 
this RS will timeout, handlers of users' service may be occupied by the 
long-waiting requests even requests belong to other RS will also be timeout.

For example: 
If we have 100 handlers in a client service(timeout is 1000ms) and HBase has 10 
region servers whose average response time is 50ms. If no region server is 
slow, we can handle 2000 requests per second.
Now this service's QPS is 1000. If there is one region server very slow and all 
requests to it will be timeout. Users hope that only 10% requests failed, and 
90% requests' response time is still 50ms, because only 10% requests are 
located to the slow RS. However, each second we have 100 long-waiting requests 
which exactly occupies all 100 handles. So all handlers is blocked, the 
availability of this service is almost zero.

To prevent this case, we can limit the max concurrent requests to one RS in 
process-level. Requests exceeding the limit will throws 
ServerBusyException(extends DoNotRetryIOE) immediately to users. In the above 
case, if we set this limit to 20, only 20 handlers will be occupied and other 
80 handlers can still handle requests to other RS. The availability of this 
service is 90% as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to