[
https://issues.apache.org/jira/browse/HBASE-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628359#comment-13628359
]
Jeffrey Zhong commented on HBASE-8321:
--------------------------------------
The recoverLease/getReader underlying are single hdfs function calls. It's hard
to heart beat inside them. Last week I talked about this with hdfs folks. It
seems that recoverLease is NN operation so it take about 1 mins in most bad
situations(rpc/socket time out) and getReader is about same if we directly go
to a bad data node(still the rpc/socket timeout value). Two mins in most case
should be good enough.
In addition, hdfs has 30(default) secs timeout to mark a bad data node stale,
therefore other workers(preempted the timeout task) has a good chance to spend
much shorter time to proceed.
IMHO, since we have hbase-6738, we can change the default value even longer
like 15 mins(I set it to 5mins in 0.94) to cover normal cases. For extreme
situations, people can adjust the config setting accordingly.
> Log split worker should heartbeat to avoid timeout
> --------------------------------------------------
>
> Key: HBASE-8321
> URL: https://issues.apache.org/jira/browse/HBASE-8321
> Project: HBase
> Issue Type: Bug
> Components: wal
> Reporter: Jimmy Xiang
> Assignee: Jimmy Xiang
>
> Currently, hlog splitter could spend quite sometime to split a log in case
> any HDFS issue and recoverLease/retry opening is needed. If distributed log
> split manager times out the log worker, other log worker to take over will
> run into the same issue.
> Ideally, we should not need a timeout monitor. Since we have a timeout
> monitor for DSL now, the worker should heartbeat to avoid wrong/unneeded
> timeouts.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira