[
https://issues.apache.org/jira/browse/HBASE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773780#comment-16773780
]
Jingyun Tian commented on HBASE-21934:
--------------------------------------
After checking the code and logs, I found the problem is that once we dispatch
there operations to region server. The Set which stores these operations will
be set to null.
{code}
public synchronized void dispatch() {
if (operations != null) {
remoteDispatch(getKey(), operations);
this.operations = null;
}
}
{code}
Then when the target region server crash and calls abortOperationsInQueue, it
will only fail these operations that not sent yet.
{code}
public synchronized void abortOperationsInQueue() {
if (operations != null) {
abortPendingOperations(getKey(), operations);
this.operations = null;
}
}
{code}
I'll add a test for this problem later. And to solve this problem, one way I
think is remove operation only when it finished. Or it should go through all
procedures to find the crash region server related ones.
> SplitWALProcedure get stuck during ITBLL
> ----------------------------------------
>
> Key: HBASE-21934
> URL: https://issues.apache.org/jira/browse/HBASE-21934
> Project: HBase
> Issue Type: Sub-task
> Reporter: Jingyun Tian
> Assignee: Jingyun Tian
> Priority: Major
>
> I encounter the problem that when master assign a splitWALRemoteProcedure to
> a region server. The log of this region server says it failed to recover the
> lease of this file. Then this region server is killed by chaosMonkey. As the
> result, this procedure is not timeout and hang there forever.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)