[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052067#comment-13052067
 ] 

Prakash Khemani commented on HBASE-4007:
----------------------------------------

@mingjian What you are talking about is probably a different issue. The 
scenario you have described can happen when 1. Master puts up a task. 2. No one 
acquires the task. 3. Master puts up a RESCAN node asking everyone to re-look 
at the zk splitlog task list.

The bug described in this jira will happen in the following way (I have not 
encountered it yet but should be easy to reproduce)

a/ A splitlog task is slow. Master has already moved the task from one worker 
to another 3 times. It is with the 4th worker now. Even if the 4th worker takes 
too long doing this task the master is not going to do anything about it.

b/ the 4th worker dies.

c/ the task will hang. 

Master has to resubmit the task when the 4th worker dies. 



> distributed log splitting can get indefinitely stuck
> ----------------------------------------------------
>
>                 Key: HBASE-4007
>                 URL: https://issues.apache.org/jira/browse/HBASE-4007
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>
> After the configured number of retries SplitLogManager is not going to 
> resubmit log-split tasks. In this situation even if the splitLogWorker that 
> owns the task dies the task will not get resubmitted.
> When a regionserver goes away then all the split-log tasks that it owned 
> should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to