[
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533239#comment-16533239
]
Duo Zhang commented on HBASE-20828:
-----------------------------------
[~allan163] has found a problem when restarting master, that we do not restore
the locks when loading procedures. And then we found that, the assumption in
MasterProcedureScheduler.waitRegions is not correct, as the parent procedure of
RegionTransitionProcedure may not have the table lock(think of SCP).
So here I think there are two problems which need to be fixed.
First is that, we need to restore the locks when loading procedures. A first
thought is that, after loading all the procedures and the procedure execution
stacks, we scan all the procedures which have sub procedures, and then for
every stack, we start from the root procedure, test the holdLock method, if it
returns true, then we will call the acquireLock method of it to get the lock.
Not sure if there are still corner cases. [~allan163] PTAL.
And for the waitRegions method, I think we should apply the patch in
HBASE-20846, i.e, always try to acquire the shared lock. But the implementation
of procedure lock needs a bit modification. If the parent procedure already
held the exclusive lock, instead of returning false to let the procedure wait,
we should return true to let the procedure go on. The locks which have already
been held by parent procedures should also be considered as held by sub
procedures. This is OK as we can make sure that the parent procedure will not
release the lock before the sub procedures, as it can only be executed again
after all the sub procedures have finished.
> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---------------------------------------------------------------
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
> Issue Type: Umbrella
> Components: amv2
> Reporter: stack
> Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also
> missing are a concise listing of the tenets of AMv2 operation. Here are some
> examples:
> * HBASE-19529 "Handle null states in AM": Asks how we should treat null
> state in hbase:meta. What does it 'mean'. We seem to treat it differently
> dependent on context. Needs clarification. [~Apache9] recently asked similar
> about the meaning of OFFLINE.
> * Logging needs to have a particular form to help trace Procedure progress;
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in
> subissues and produce specification doc too. We have the below but these are
> mostly (incomplete) description for devs on pv2 and amv2; the specification
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and
> when it is not appropriate -- as well as recommendation on Procedures
> coarseness, locking -- is it ok to lock table in alter table procedure for
> the life of the procedure? -- and so on).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)