[ 
https://issues.apache.org/jira/browse/HBASE-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633070#comment-13633070
 ] 

Jeffrey Zhong commented on HBASE-8321:
--------------------------------------

The second patch looks good to me(+1) with one small comment:
{code}
+    report_period = conf.getInt("hbase.splitlog.report.period",
+      conf.getInt("hbase.splitlog.manager.timeout",
+        SplitLogManager.DEFAULT_TIMEOUT) / 2);

....

         public boolean progress() {
-          if (!attemptToOwnTask(false)) {
-            LOG.warn("Failed to heartbeat the task" + currentTask);
-            return false;
+          long t = EnvironmentEdgeManager.currentTimeMillis();
+          if ((t - last_report_at) > report_period) {
+            last_report_at = t;
+            if (!attemptToOwnTask(false)) {
+              LOG.warn("Failed to heartbeat the task" + currentTask);
+              return false;
+            }
{code}

In the latest patch, we heartbeat only after a report_period which by default 
is SplitLogManager.TIMEOUT/ 2. If splitLogWorker miss one(e.g. it tries to 
report right before a report_period) and next report take a little longer than 
one report_period then the work will be preempted by SplitLogManager. 
Therefore, I'd suggest we change report_period default value to 
SplitLogManager.TIMEOUT/5 or something you think is more appropriate.

                
> Log split worker should heartbeat to avoid timeout when the hlog is under 
> recovery
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8321
>                 URL: https://issues.apache.org/jira/browse/HBASE-8321
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>         Attachments: trunk-8321_v1.patch, trunk-8321_v2.patch
>
>
> Currently, hlog splitter could spend quite sometime to split a log in case 
> any HDFS issue and recoverLease/retry opening is needed.  If distributed log 
> split manager times out the log worker, other log worker to take over will 
> run into the same issue.
> Ideally, we should not need a timeout monitor.  Since we have a timeout 
> monitor for DSL now, the worker should heartbeat to avoid wrong/unneeded 
> timeouts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to