[
https://issues.apache.org/jira/browse/HBASE-18261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108022#comment-16108022
]
Umesh Agashe commented on HBASE-18261:
--------------------------------------
Thanks for reviewing and pushing the changes, [~stack]!
> [AMv2] Create new RecoverMetaProcedure and use it from ServerCrashProcedure
> and HMaster.finishActiveMasterInitialization()
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-18261
> URL: https://issues.apache.org/jira/browse/HBASE-18261
> Project: HBase
> Issue Type: Improvement
> Components: amv2
> Affects Versions: 2.0.0-alpha-1
> Reporter: Umesh Agashe
> Assignee: Umesh Agashe
> Fix For: 2.0.0-alpha-2
>
> Attachments: hbase-18261.master.001.patch,
> HBASE-18261.master.001.patch, hbase-18261.master.002.patch,
> hbase-18261.master.003.patch, hbase-18261.master.004.patch,
> hbase-18261.master.005.patch
>
>
> When unit test
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta()
> is enabled and run several times, it fails intermittently. Cause is meta
> recovery is done at two different places:
> * ServerCrashProcedure.processMeta()
> * HMaster.finishActiveMasterInitialization()
> and its not coordinated.
> When HMaster.finishActiveMasterInitialization() gets to submit splitMetaLog()
> first and while its running call from ServerCrashProcedure.processMeta()
> fails causing step to be retried again in a loop.
> When ServerCrashProcedure.processMeta() submits splitMetaLog after
> splitMetaLog from HMaster.finishActiveMasterInitialization() is finished,
> success is returned without doing any work.
> But if ServerCrashProcedure.processMeta() submits splitMetaLog request and
> while its going HMaster.finishActiveMasterInitialization() submits it test
> fails with exception.
> [~stack] and I discussed the possible solution:
> Create RecoverMetaProcedure and call it where required. Procedure framework
> provides mutual exclusion and requires idempotence, which should fix the
> problem.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)