nkeywal created HBASE-7815:
------------------------------
Summary: Too subtile behavior for HConnection#getRegionLocation
reload parameter and performance risk
Key: HBASE-7815
URL: https://issues.apache.org/jira/browse/HBASE-7815
Project: HBase
Issue Type: Bug
Components: Client, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Priority: Minor
HConnection#getRegionLocation(table, row, reload=true) and
HConnection#getRegionLocation(table, row, reload=false) are not equivalent when
the cache is empty: the first will check the table status while the second will
not.
As a consequence, the client won't have the same exception if the table is
disabled. With reload==true, we will have a DoNotRetryIOException, with a
message saying that the table is disabled. With reload==false we will have a
NotServingException. It's quite difficult to guess, as it's not mentioned in
the javadoc.
Second effect is that the client is going to ZooKeeper to check this table
state. In ServerCallable, if the first try is not successful, we will then go
all the time to ZK to check this status. So if a region server stops, all its
clients will connect to ZK, possibly multiple time if the recovery takes some
time. With a few hundreds clients, it's not very nice...
I'm not sure of the solution. A possible improvement in ServerCallable would be
to do a reload only at the first retry instead of all of them, but:
- it's not without side effects, even if it's limited
- the real cost is the first try, as it may creates a ZK connection.
Another thing to do would be to limit the reload to the case it makes sense. In
locateRegionInMeta there is a test on the exception:(e instanceof
RegionOfflineException || e instanceof NoServerForRegionException).
May be this logic could be put in ServerCallable as well, but we need to cover
all cases.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira