[ 
https://issues.apache.org/jira/browse/HBASE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464874#comment-13464874
 ] 

Prakash Khemani commented on HBASE-6878:
----------------------------------------

The logic to indefinitely retry a failing log-splitting task is not inside 
SplitLogManager. SplitLogManager will retry a task finite number of times. If 
it fails then it is the outer Master layers that indefinitely retry. the reason 
for this behavior is to build tools around distributed log splitting. If 
distributed log splitting were being used by a tool then you wouldn't want it 
to indefinitely retry.

So the behavior outlined in this bug report is correct. But this behavior 
shouldn't lead to any bug.

(There are only a few places in SplitLogManager where it resubmits the task 
forcefully, disregarding the retry limit. I think the only two cases are when a 
region server (splitlogworker) dies and when a splitlogworker "resigns" from 
the task (i.e. gives up the task even though there were no failures))
                
> DistributerLogSplit can fail to resubmit a task done if there is an exception 
> during the log archiving
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6878
>                 URL: https://issues.apache.org/jira/browse/HBASE-6878
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: nkeywal
>            Priority: Minor
>
> The code in SplitLogManager# getDataSetWatchSuccess is:
> {code}
> if (slt.isDone()) {
>       LOG.info("task " + path + " entered state: " + slt.toString());
>       if (taskFinisher != null && !ZKSplitLog.isRescanNode(watcher, path)) {
>         if (taskFinisher.finish(slt.getServerName(), 
> ZKSplitLog.getFileName(path)) == Status.DONE) {
>           setDone(path, SUCCESS);
>         } else {
>           resubmitOrFail(path, CHECK);
>         }
>       } else {
>         setDone(path, SUCCESS);
>       }
> {code}
>           resubmitOrFail(path, CHECK);
> should be 
>           resubmitOrFail(path, FORCE);
> Without it, the task won't be resubmitted if the delay is not reached, and 
> the task will be marked as failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to