[ https://issues.apache.org/jira/browse/HBASE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15675588#comment-15675588 ]
Yu Li commented on HBASE-17114: ------------------------------- bq. Another approach to this would be to allow the server to hint back to the client how long it should back off I guess the above statement about "back off" is the back off policy instead of the exponential backoff array? So I checked the default value of {{ClientBackoffPolicy}}, or could you please explain how to make server hint back? [~ghelmling] bq. If you want to make this overridable for some exception types, that seems ok, but in that case the config property for overriding the value should be more closely tied to the exception. Well, if checking the uploaded patch, it's indeed tied to CQTBE only. Introducing a new property is only for making things more flexible, and of course we could use a hard-coded, like 5 times than the existing pause, for CQTBE. But I'd say this is a trade-off, waiting longer for CQTBE could prevent the vicious circle but is also causing a higher latency, and IMHO user should be able to control such trade-off. If they don't want CQTBE to be special, they could set {{hbase.client.pause.special}} to the same value as {{hbase.client.pause}}, which gives them more options. No offense but I'm even thinking of making CQTBE thrown optional, because for some case dead-wait for the request to be executed in RpcServer until time-out is preferable by user rather than receiving some exception and retry and fail again, but obviously this is another topic (Smile). bq. It's only special in the sense that it should not clear the client meta cache Sorry but I don't see any difference in "should not clear the client meta cache" and "should not retry so frequently", both trying to resolve some problem and make things better. OTOH, we already have the {{RetryImmediatelyException}} just because in some case retry w/o waiting is good, then why retry slower is not acceptable? Now that the retry pause already split into immediately and wait, I think it's ok to further split the wait case into quick and slow, wdyt? Thanks. > Add an option to set special retry pause when encountering > CallQueueTooBigException > ----------------------------------------------------------------------------------- > > Key: HBASE-17114 > URL: https://issues.apache.org/jira/browse/HBASE-17114 > Project: HBase > Issue Type: Bug > Reporter: Yu Li > Assignee: Yu Li > Attachments: HBASE-17114.patch > > > As titled, after HBASE-15146 we will throw {{CallQueueTooBigException}} > instead of dead-wait. This is good for performance for most cases but might > cause a side-effect that if too many clients connect to the busy RS, that the > retry requests may come over and over again and RS never got the chance for > recovering, and the issue will become especially critical when the target > region is META. > So here in this JIRA we propose to supply some special retry pause for CQTBE > in name of {{hbase.client.pause.special}}, and by default it will be 500ms (5 > times of {{hbase.client.pause}} default value) -- This message was sent by Atlassian JIRA (v6.3.4#6332)