[
https://issues.apache.org/jira/browse/HBASE-21885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-21885.
-------------------------------
Resolution: Not A Problem
I think after HBASE-22074 it is not a problem any more, as now we will bring
the procedure id in the reportRegionStateTransition call, so w will just ignore
the retry request as the procedure id does not match.
> Cancel remote procedure call if the remote procedure is succeeded
> -----------------------------------------------------------------
>
> Key: HBASE-21885
> URL: https://issues.apache.org/jira/browse/HBASE-21885
> Project: HBase
> Issue Type: Improvement
> Components: proc-v2
> Reporter: Duo Zhang
> Priority: Major
>
> I used to think it could rarely rarely happen that a region server can report
> back to master but master can not get the response from region server, only
> if there are strange network errors. But when implementing HBASE-21875, I
> found a way to reproduce the problem without any strange network issues.
> First time, we send the request to region server, and it accept the request,
> but before returning, there is a network error cause the connection to be
> broken, so master will try to send the request to the region server again.
> But then the region server gets too busy, and always returns
> CallQueueTooBigException, then the master will retry forever, even if the
> region has already been opened on the region server.
> And this is not only waste more resources, as later we may close the region
> on the region server, and if the region server is back, we will receive an
> open region requst and a close region request at the same time. Not sure if
> this will cause any problems but at least, we haven't thought this condition
> yet.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)