[
https://issues.apache.org/jira/browse/HBASE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672662#comment-15672662
]
Yu Li commented on HBASE-17114:
-------------------------------
Thanks [~ghelmling] for the feedback and [~tedyu]/[~zghaobac] for chiming in.
bq. But, in general, I'm not sure we should handle CQTBE differently from any
other retry-triggering exception (other than RetryImmediatelyException), and
giving another knob to configure seems like it would just further complicate
HBase tuning.
AFAICS we're already doing this in
{{ClientExceptionsUtil#isMetaClearingException}} and treated
CQTBE/RegionTooBusyException etc. as special exceptions:
{code}
public static boolean isSpecialException(Throwable cur) {
return (cur instanceof RegionMovedException || cur instanceof
RegionOpeningException
|| cur instanceof RegionTooBusyException || cur instanceof
ThrottlingException
|| cur instanceof CallQueueTooBigException);
}
{code}
So handling CQTBE specially may not seem so special?
bq. Another approach to this would be to allow the server to hint back to the
client how long it should back off
Agree this is another good way to handle this, but by default we are still
using {{NoBackoffPolicy}} right? So no matter what new mechanism we add into
back off policy, by default it won't be valid? Like in our case we're not
turning on back off, so this solution won't work for us by default.
IMHO we could open another JIRA to introduce the more fancy solution for back
off, and since XiaoMi already has some patch running online I guess [~zghaobac]
may like to take the new JIRA? (and to be frank, this kind of patch is well
welcome to upstream rather than keeping private :-)). Meanwhile, we should also
resolve the problem for users not using back off, and since the problem does
exist and we already have some special exception handling logic on client side,
the method I proposed is still valid?
I'm uploading the patch, it will tell how many the changes are so we could
better check whether it breaks any code scalability/grace. Let me know your
thoughts.
> Add an option to set special retry pause when encountering
> CallQueueTooBigException
> -----------------------------------------------------------------------------------
>
> Key: HBASE-17114
> URL: https://issues.apache.org/jira/browse/HBASE-17114
> Project: HBase
> Issue Type: Bug
> Reporter: Yu Li
> Assignee: Yu Li
>
> As titled, after HBASE-15146 we will throw {{CallQueueTooBigException}}
> instead of dead-wait. This is good for performance for most cases but might
> cause a side-effect that if too many clients connect to the busy RS, that the
> retry requests may come over and over again and RS never got the chance for
> recovering, and the issue will become especially critical when the target
> region is META.
> So here in this JIRA we propose to supply some special retry pause for CQTBE
> in name of {{hbase.client.pause.special}}, and by default it will be 500ms (5
> times of {{hbase.client.pause}} default value)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)