[
https://issues.apache.org/jira/browse/HBASE-21885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766735#comment-16766735
]
Duo Zhang commented on HBASE-21885:
-----------------------------------
[~sershe] [~stack] FYI.
> Cancel remote procedure call if the remote procedure is succeeded
> -----------------------------------------------------------------
>
> Key: HBASE-21885
> URL: https://issues.apache.org/jira/browse/HBASE-21885
> Project: HBase
> Issue Type: Improvement
> Components: proc-v2
> Reporter: Duo Zhang
> Priority: Major
>
> I used to think it could rarely rarely happen that a region server can report
> back to master but master can not get the response from region server, only
> if there are strange network errors. But when implementing HBASE-21875, I
> found a way to reproduce the problem without any strange network issues.
> First time, we send the request to region server, and it accept the request,
> but before returning, there is a network error cause the connection to be
> broken, so master will try to send the request to the region server again.
> But then the region server gets too busy, and always returns
> CallQueueTooBigException, then the master will retry forever, even if the
> region has already been opened on the region server.
> And this is not only waste more resources, as later we may close the region
> on the region server, and if the region server is back, we will receive an
> open region requst and a close region request at the same time. Not sure if
> this will cause any problems but at least, we haven't thought this condition
> yet.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)