[ 
https://issues.apache.org/jira/browse/HBASE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773780#comment-16773780
 ] 

Jingyun Tian commented on HBASE-21934:
--------------------------------------

After checking the code and logs, I found the problem is that once we dispatch 
there operations to region server. The Set which stores these operations will 
be set to null.

{code}
public synchronized void dispatch() {
 if (operations != null) {
 remoteDispatch(getKey(), operations);
 this.operations = null;
 }
}

{code}

Then when the target region server crash and calls abortOperationsInQueue, it 
will only fail these operations that not sent yet.

{code}

public synchronized void abortOperationsInQueue() {
 if (operations != null) {
 abortPendingOperations(getKey(), operations);
 this.operations = null;
 }
}

{code}

I'll add a test for this problem later. And to solve this problem, one way I 
think is remove operation only when it finished. Or it should go through all 
procedures to find the crash region server related ones.

> SplitWALProcedure get stuck during ITBLL
> ----------------------------------------
>
>                 Key: HBASE-21934
>                 URL: https://issues.apache.org/jira/browse/HBASE-21934
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jingyun Tian
>            Assignee: Jingyun Tian
>            Priority: Major
>
> I encounter the problem that when master assign a splitWALRemoteProcedure to 
> a region server. The log of this region server says it failed to recover the 
> lease of this file. Then this region server is killed by chaosMonkey. As the 
> result, this procedure is not timeout and hang there forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to