[ 
https://issues.apache.org/jira/browse/HBASE-18261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-18261:
---------------------------------
    Attachment: hbase-18261.master.003.patch

> [AMv2] Create new RecoverMetaProcedure and use it from ServerCrashProcedure 
> and HMaster.finishActiveMasterInitialization()
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18261
>                 URL: https://issues.apache.org/jira/browse/HBASE-18261
>             Project: HBase
>          Issue Type: Improvement
>          Components: amv2
>    Affects Versions: 2.0.0-alpha-1
>            Reporter: Umesh Agashe
>            Assignee: Umesh Agashe
>             Fix For: 2.0.0-alpha-2
>
>         Attachments: hbase-18261.master.001.patch, 
> HBASE-18261.master.001.patch, hbase-18261.master.002.patch, 
> hbase-18261.master.003.patch
>
>
> When unit test 
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta()
>  is enabled and run several times, it fails intermittently. Cause is meta 
> recovery is done at two different places:
> * ServerCrashProcedure.processMeta()
> * HMaster.finishActiveMasterInitialization()
> and its not coordinated.
> When HMaster.finishActiveMasterInitialization() gets to submit splitMetaLog() 
> first and while its running call from ServerCrashProcedure.processMeta() 
> fails causing step to be retried again in a loop.
> When ServerCrashProcedure.processMeta() submits splitMetaLog after 
> splitMetaLog from HMaster.finishActiveMasterInitialization() is finished, 
> success is returned without doing any work.
> But if ServerCrashProcedure.processMeta() submits splitMetaLog request and 
> while its going HMaster.finishActiveMasterInitialization() submits it test 
> fails with exception.
> [~stack] and I discussed the possible solution:
> Create RecoverMetaProcedure and call it where required. Procedure framework 
> provides mutual exclusion and requires idempotence, which should fix the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to