Duo Zhang created HBASE-21885:
---------------------------------
Summary: Cancel remote procedure call if the remote procedure is
succeeded
Key: HBASE-21885
URL: https://issues.apache.org/jira/browse/HBASE-21885
Project: HBase
Issue Type: Improvement
Components: proc-v2
Reporter: Duo Zhang
I used to think it could rarely rarely happen that a region server can report
back to master but master can not get the response from region server, only if
there are strange network errors. But when implementing HBASE-21875, I found a
way to reproduce the problem without any strange network issues.
First time, we send the request to region server, and it accept the request,
but before returning, there is a network error cause the connection to be
broken, so master will try to send the request to the region server again. But
then the region server gets too busy, and always returns
CallQueueTooBigException, then the master will retry forever, even if the
region has already been opened on the region server.
And this is not only waste more resources, as later we may close the region on
the region server, and if the region server is back, we will receive an open
region requst and a close region request at the same time. Not sure if this
will cause any problems but at least, we haven't thought this condition yet.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)