[ https://issues.apache.org/jira/browse/HBASE-20846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540528#comment-16540528 ]
stack commented on HBASE-20846: ------------------------------- Updating procedure store is expensive...takes time. Do we need a new boolean? What cases do we need to handle on load post-crash? To my mind, the procedure that holds a lock for the life of the procedure is the only case that needs re-institution? When we read the store WALs on startup, we read in order. Can we not just look for Procedures with holdLock=true and if procedure is not yet finished -- it has sub-procedures or not -- just re-create their lock before starting the procedure workers? > Table's shared lock is not held by sub-procedures after master restart > ---------------------------------------------------------------------- > > Key: HBASE-20846 > URL: https://issues.apache.org/jira/browse/HBASE-20846 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.1.0 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20846.branch-2.0.002.patch, > HBASE-20846.branch-2.0.patch > > > Found this one when investigating ModifyTableProcedure got stuck while there > was a MoveRegionProcedure going on after master restart. > Though this issue can be solved by HBASE-20752. But I discovered something > else. > Before a MoveRegionProcedure can execute, it will hold the table's shared > lock. so,, when a UnassignProcedure was spwaned, it will not check the > table's shared lock since it is sure that its parent(MoveRegionProcedure) has > aquired the table's lock. > {code:java} > // If there is parent procedure, it would have already taken xlock, so no > need to take > // shared lock here. Otherwise, take shared lock. > if (!procedure.hasParent() > && waitTableQueueSharedLock(procedure, table) == null) { > return true; > } > {code} > But, it is not the case when Master was restarted. The child > procedure(UnassignProcedure) will be executed first after restart. Though it > has a parent(MoveRegionProcedure), but apprently the parent didn't hold the > table's lock. > So, since it began to execute without hold the table's shared lock. A > ModifyTableProcedure can aquire the table's exclusive lock and execute at the > same time. Which is not possible if the master was not restarted. > This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, > I wrote a simple UT to repo this case. > I think we don't have to check the parent for table's shared lock. It is a > shared lock, right? I think we can acquire it every time we need it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)