[jira] [Commented] (HBASE-17114) Add an option to set special retry pause when encountering CallQueueTooBigException

Yu Li (JIRA) Thu, 17 Nov 2016 19:14:15 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15675588#comment-15675588
 ]


Yu Li commented on HBASE-17114:
-------------------------------

bq. Another approach to this would be to allow the server to hint back to the 
client how long it should back off
I guess the above statement about "back off" is the back off policy instead of 
the exponential backoff array? So I checked the default value of 
{{ClientBackoffPolicy}}, or could you please explain how to make server hint 
back? [~ghelmling]

bq. If you want to make this overridable for some exception types, that seems 
ok, but in that case the config property for overriding the value should be 
more closely tied to the exception.
Well, if checking the uploaded patch, it's indeed tied to CQTBE only. 
Introducing a new property is only for making things more flexible, and of 
course we could use a hard-coded, like 5 times than the existing pause, for 
CQTBE. But I'd say this is a trade-off, waiting longer for CQTBE could prevent 
the vicious circle but is also causing a higher latency, and IMHO user should 
be able to control such trade-off. If they don't want CQTBE to be special, they 
could set {{hbase.client.pause.special}} to the same value as 
{{hbase.client.pause}}, which gives them more options.

No offense but I'm even thinking of making CQTBE thrown optional, because for 
some case dead-wait for the request to be executed in RpcServer until time-out 
is preferable by user rather than receiving some exception and retry and fail 
again, but obviously this is another topic (Smile).

bq. It's only special in the sense that it should not clear the client meta 
cache
Sorry but I don't see any difference in "should not clear the client meta 
cache" and "should not retry so frequently", both trying to resolve some 
problem and make things better.

OTOH, we already have the {{RetryImmediatelyException}} just because in some 
case retry w/o waiting is good, then why retry slower is not acceptable? Now 
that the retry pause already split into immediately and wait, I think it's ok 
to further split the wait case into quick and slow, wdyt?

Thanks.

> Add an option to set special retry pause when encountering 
> CallQueueTooBigException
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-17114
>                 URL: https://issues.apache.org/jira/browse/HBASE-17114
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-17114.patch
>
>
> As titled, after HBASE-15146 we will throw {{CallQueueTooBigException}} 
> instead of dead-wait. This is good for performance for most cases but might 
> cause a side-effect that if too many clients connect to the busy RS, that the 
> retry requests may come over and over again and RS never got the chance for 
> recovering, and the issue will become especially critical when the target 
> region is META.
> So here in this JIRA we propose to supply some special retry pause for CQTBE 
> in name of {{hbase.client.pause.special}}, and by default it will be 500ms (5 
> times of {{hbase.client.pause}} default value)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-17114) Add an option to set special retry pause when encountering CallQueueTooBigException

Reply via email to