[ 
https://issues.apache.org/jira/browse/HBASE-29256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17943747#comment-17943747
 ] 

Duo Zhang commented on HBASE-29256:
-----------------------------------

OK, so it is the same problem, the procedure store failed to persist.

I suggest we can server.abort in RegionProcedureStore when there are failures 
updating the region.



> Multiple Split Procedures on same region stuck indefinitely waiting for 
> Exclusive Lock
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-29256
>                 URL: https://issues.apache.org/jira/browse/HBASE-29256
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prathyusha
>            Priority: Major
>
> Multiple Split Procedures on same region got stuck indefinitely waiting for 
> Exclusive Lock help by the first Split Procedure created on the region and 
> that procedure wasnt scheduled for almost a week till HMaster restart 
> happened.
> First SplitProcedure created failed to update procedure storeĀ 
> {color:#4c9aff}_ERROR [PEWorker-25] region.RegionProcedureStore - Failed to 
> update proc pid=966118, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE, 
> locked=true; SplitTableRegionProcedure table=_tablename_, 
> parent=_parent-XXX_, daughterA=_daughter1-xxx_, daughterB=_daughter2-xxx_ 
> java.io.InterruptedIOException: No ack received after 25s and a timeout of 
> 25s at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:938) 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:692) 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:580)_{color}
> All the rest of the SplitProcedures were waiting on the Exclusive lock held 
> by above pid, and the first one never got rescheduled till a HMaster restart.
> {color:#4c9aff}_assignment.SplitTableRegionProcedure - LOCK_EVENT_WAIT 
> serverLocks={}, namespaceLocks={{default=exclusiveLockOwner=NONE, 
> sharedLockCount=1, waitingProcCount=0}}, 
> tableLocks={{tsdb=exclusiveLockOwner=NONE, sharedLockCount=1, 
> waitingProcCount=0}}, regionLocks={{parent-XXX=exclusiveLockOwner=966118, 
> sharedLockCount=0, waitingProcCount=8043}}, peerLocks={}, metaLocks={}_{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to