[jira] [Commented] (HBASE-17114) Add an option to set special retry pause when encountering CallQueueTooBigException

Yu Li (JIRA) Wed, 16 Nov 2016 20:11:58 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672662#comment-15672662
 ]


Yu Li commented on HBASE-17114:
-------------------------------

Thanks [~ghelmling] for the feedback and [~tedyu]/[~zghaobac] for chiming in.

bq. But, in general, I'm not sure we should handle CQTBE differently from any 
other retry-triggering exception (other than RetryImmediatelyException), and 
giving another knob to configure seems like it would just further complicate 
HBase tuning.
AFAICS we're already doing this in 
{{ClientExceptionsUtil#isMetaClearingException}} and treated 
CQTBE/RegionTooBusyException etc. as special exceptions:
{code}
  public static boolean isSpecialException(Throwable cur) {
    return (cur instanceof RegionMovedException || cur instanceof 
RegionOpeningException
        || cur instanceof RegionTooBusyException || cur instanceof 
ThrottlingException
        || cur instanceof CallQueueTooBigException);
  }
{code}
So handling CQTBE specially may not seem so special?

bq. Another approach to this would be to allow the server to hint back to the 
client how long it should back off
Agree this is another good way to handle this, but by default we are still 
using {{NoBackoffPolicy}} right? So no matter what new mechanism we add into 
back off policy, by default it won't be valid? Like in our case we're not 
turning on back off, so this solution won't work for us by default.

IMHO we could open another JIRA to introduce the more fancy solution for back 
off, and since XiaoMi already has some patch running online I guess [~zghaobac] 
may like to take the new JIRA? (and to be frank, this kind of patch is well 
welcome to upstream rather than keeping private :-)). Meanwhile, we should also 
resolve the problem for users not using back off, and since the problem does 
exist and we already have some special exception handling logic on client side, 
the method I proposed is still valid?

I'm uploading the patch, it will tell how many the changes are so we could 
better check whether it breaks any code scalability/grace. Let me know your 
thoughts.

> Add an option to set special retry pause when encountering 
> CallQueueTooBigException
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-17114
>                 URL: https://issues.apache.org/jira/browse/HBASE-17114
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
>
> As titled, after HBASE-15146 we will throw {{CallQueueTooBigException}} 
> instead of dead-wait. This is good for performance for most cases but might 
> cause a side-effect that if too many clients connect to the busy RS, that the 
> retry requests may come over and over again and RS never got the chance for 
> recovering, and the issue will become especially critical when the target 
> region is META.
> So here in this JIRA we propose to supply some special retry pause for CQTBE 
> in name of {{hbase.client.pause.special}}, and by default it will be 500ms (5 
> times of {{hbase.client.pause}} default value)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-17114) Add an option to set special retry pause when encountering CallQueueTooBigException

Reply via email to