nkeywal created HBASE-7815:
------------------------------

             Summary: Too subtile behavior for HConnection#getRegionLocation 
reload parameter and performance risk
                 Key: HBASE-7815
                 URL: https://issues.apache.org/jira/browse/HBASE-7815
             Project: HBase
          Issue Type: Bug
          Components: Client, regionserver
    Affects Versions: 0.96.0
            Reporter: nkeywal
            Priority: Minor


HConnection#getRegionLocation(table, row, reload=true) and 
HConnection#getRegionLocation(table, row, reload=false) are not equivalent when 
the cache is empty: the first will check the table status while the second will 
not.


As a consequence, the client won't have the same exception if the table is 
disabled. With reload==true, we will have a DoNotRetryIOException, with a 
message saying that the table is disabled. With reload==false we will have a 
NotServingException. It's quite difficult to guess, as it's not mentioned in 
the javadoc.


Second effect is that the client is going to ZooKeeper to check this table 
state. In ServerCallable, if the first try is not successful, we will then go 
all the time to ZK to check this status. So if a region server stops, all its 
clients will connect to ZK, possibly multiple time if the recovery takes some 
time. With a few hundreds clients, it's not very nice...

I'm not sure of the solution. A possible improvement in ServerCallable would be 
to do a reload only at the first retry instead of all of them, but:
- it's not without side effects, even if it's limited
- the real cost is the first try, as it may creates a ZK connection.

Another thing to do would be to limit the reload to the case it makes sense. In 
locateRegionInMeta there is a test on the exception:(e instanceof 
RegionOfflineException || e instanceof NoServerForRegionException).

May be this logic could be put in ServerCallable as well, but we need to cover 
all cases.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to