[
https://issues.apache.org/jira/browse/HBASE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
huaxiang sun reassigned HBASE-13850:
------------------------------------
Assignee: huaxiang sun
> Check for dead server on CallTimeoutException
> ---------------------------------------------
>
> Key: HBASE-13850
> URL: https://issues.apache.org/jira/browse/HBASE-13850
> Project: HBase
> Issue Type: Improvement
> Components: Client, MTTR
> Affects Versions: 2.0.0, 1.2.0
> Reporter: Matteo Bertozzi
> Assignee: huaxiang sun
> Priority: Minor
> Attachments: HBASE-13850-v0.patch, TestGetPerf.java
>
>
> WARN this may be a misconf, so let me know if there is a timeout param to set.
> {noformat}
> hbase-site.xml
> zookeeper.session.timeout 10000
> hbase.regionserver.storefile.refresh.period 10000
> hbase.client.operation.timeout 5000
> hbase.client.meta.operation.timeout 5000
> hbase.client.scanner.timeout.period 10000
> hbase.regionserver.lease.period 10000
> {noformat}
> I have a test that does a kill STOP on a RS and tries to query it.
> From the conf the zk lease is 10sec, and the master is correctly doing the
> reassign after 10sec and meta is updated.
> the client keep trying to query the RS for a specific row until it get a
> response. The table.get(row) in the loop throws a CallTimeoutException every
> 5sec (which is the configured settings). but instead of succeed after 2/3
> retries (> 10sec where the master reassign) it keeps retrying up to 60sec (I
> don't know what that 60sec is, maybe a conf param that I'm not able to find)
> one simple fix in the code is handling the CallTimeoutException in
> RegionServerCallable and clear the meta cache for that RS that is not
> responding. (but maybe there is already a conf to set to reduce that 60sec
> period)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)