[ 
https://issues.apache.org/jira/browse/HBASE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672280#comment-15672280
 ] 

Gary Helmling commented on HBASE-17114:
---------------------------------------

The new CoDel may help in successfully processing more requests in these 
overloaded situations.

But, in general, I'm not sure we should handle CQTBE differently from any other 
retry-triggering exception (other than RetryImmediatelyException), and giving 
another knob to configure seems like it would just further complicate HBase 
tuning.

Another approach to this would be to allow the server to hint back to the 
client how long it should back off.  In this case, the exception itself could 
carry a multiplier as part of the payload.  As the server remains overloaded 
for a longer and longer period of time, in could increase the multiplier 
returned in the exception, which would allow it to hint to clients that they 
should back off for longer.  The heuristics for doing this correctly may be 
tricky to get right, but I think this could be more generally applicable.  We 
could introduce a new parent exception (RetryIOException) to contain the 
multiplier and apply this in all situations that make sense.  However, this 
would also require a change to RPC to carry through the multiplier value.  This 
isn't perfect either -- the multiplier received by the client represents the 
server state at a previous point in time, which may already have changed.  But 
I think this is better than just statically configuring different pauses for 
different exceptions.

> Add an option to set special retry pause when encountering 
> CallQueueTooBigException
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-17114
>                 URL: https://issues.apache.org/jira/browse/HBASE-17114
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
>
> As titled, after HBASE-15146 we will throw {{CallQueueTooBigException}} 
> instead of dead-wait. This is good for performance for most cases but might 
> cause a side-effect that if too many clients connect to the busy RS, that the 
> retry requests may come over and over again and RS never got the chance for 
> recovering, and the issue will become especially critical when the target 
> region is META.
> So here in this JIRA we propose to supply some special retry pause for CQTBE 
> in name of {{hbase.client.pause.special}}, and by default it will be 500ms (5 
> times of {{hbase.client.pause}} default value)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to