[jira] [Commented] (HBASE-8321) Log split worker should heartbeat to avoid timeout

Jeffrey Zhong (JIRA) Wed, 10 Apr 2013 15:21:23 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628359#comment-13628359
 ]


Jeffrey Zhong commented on HBASE-8321:
--------------------------------------

The recoverLease/getReader underlying are single hdfs function calls. It's hard 
to heart beat inside them. Last week I talked about this with hdfs folks. It 
seems that recoverLease is NN operation so it take about 1 mins in most bad 
situations(rpc/socket time out) and getReader is about same if we directly go 
to a bad data node(still the rpc/socket timeout value). Two mins in most case 
should be good enough. 

In addition, hdfs has 30(default) secs timeout to mark a bad data node stale, 
therefore other workers(preempted the timeout task) has a good chance to spend 
much shorter time to proceed.

IMHO, since we have hbase-6738, we can change the default value even longer 
like 15 mins(I set it to 5mins in 0.94) to cover normal cases. For extreme 
situations, people can adjust the config setting accordingly.
                
> Log split worker should heartbeat to avoid timeout
> --------------------------------------------------
>
>                 Key: HBASE-8321
>                 URL: https://issues.apache.org/jira/browse/HBASE-8321
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>
> Currently, hlog splitter could spend quite sometime to split a log in case 
> any HDFS issue and recoverLease/retry opening is needed.  If distributed log 
> split manager times out the log worker, other log worker to take over will 
> run into the same issue.
> Ideally, we should not need a timeout monitor.  Since we have a timeout 
> monitor for DSL now, the worker should heartbeat to avoid wrong/unneeded 
> timeouts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8321) Log split worker should heartbeat to avoid timeout

Reply via email to