[ 
https://issues.apache.org/jira/browse/HBASE-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282463#comment-16282463
 ] 

stack commented on HBASE-19287:
-------------------------------

Thanks for the patch.  Could bulk of handleMetaRITOnCrashedServer go inside of 
AM? Perhaps as two methods on AM? So we don't have to entities other than AM 
having to know about RIT?

File an issue for the bigger problem? Thanks [~easyliangjob]

> master hangs forever if RecoverMeta send assign meta region request to target 
> server fail
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-19287
>                 URL: https://issues.apache.org/jira/browse/HBASE-19287
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.0.0
>            Reporter: Yi Liang
>            Assignee: Yi Liang
>         Attachments: master.patch
>
>
> 2017-11-10 19:26:56,019 INFO  [ProcExecWrkr-1] 
> procedure.RecoverMetaProcedure: pid=138, 
> state=RUNNABLE:RECOVER_META_ASSIGN_REGIONS; RecoverMetaProcedure 
> failedMetaServer=null, splitWal=true; Retaining meta assignment to 
> server=hadoop-slave1.hadoop,16020,1510341981454
> 2017-11-10 19:26:56,029 INFO  [ProcExecWrkr-1] procedure2.ProcedureExecutor: 
> Initialized subprocedures=[{pid=139, ppid=138, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
> region=1588230740, target=hadoop-slave1.hadoop,16020,1510341981454}]
> 2017-11-10 19:26:56,067 INFO  [ProcExecWrkr-2] 
> procedure.MasterProcedureScheduler: pid=139, ppid=138, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
> region=1588230740, target=hadoop-slave1.hadoop,16020,1510341981454 hbase:meta 
> hbase:meta,,1.1588230740
> 2017-11-10 19:26:56,071 INFO  [ProcExecWrkr-2] assignment.AssignProcedure: 
> Start pid=139, ppid=138, state=RUNNABLE:REGION_TRANSITION_QUEUE; 
> AssignProcedure table=hbase:meta, region=1588230740, 
> target=hadoop-slave1.hadoop,16020,1510341981454; rit=OFFLINE, 
> location=hadoop-slave1.hadoop,16020,1510341981454; forceNewPlan=false, 
> retain=false
> 2017-11-10 19:26:56,224 INFO  [ProcExecWrkr-4] zookeeper.MetaTableLocator: 
> Setting hbase:meta (replicaId=0) location in ZooKeeper as 
> hadoop-slave2.hadoop,16020,1510341988652
> 2017-11-10 19:26:56,230 INFO  [ProcExecWrkr-4] 
> assignment.RegionTransitionProcedure: Dispatch pid=139, ppid=138, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta, 
> region=1588230740, target=hadoop-slave1.hadoop,16020,1510341981454; 
> rit=OPENING, location=hadoop-slave2.hadoop,16020,1510341988652
> 2017-11-10 19:26:56,382 INFO  [ProcedureDispatcherTimeoutThread] 
> procedure.RSProcedureDispatcher: Using procedure batch rpc execution for 
> serverName=hadoop-slave2.hadoop,16020,1510341988652 version=2097152
> 2017-11-10 19:26:57,542 INFO  [main-EventThread] 
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, 
> processing expiration [hadoop-slave2.hadoop,16020,1510341988652]
> 2017-11-10 19:26:57,543 INFO  [main-EventThread] master.ServerManager: Master 
> doesn't enable ServerShutdownHandler during initialization, delay expiring 
> server hadoop-slave2.hadoop,16020,1510341988652
> 2017-11-10 19:26:58,875 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> master.ServerManager: Registering 
> server=hadoop-slave1.hadoop,16020,1510342016106
> 2017-11-10 19:27:05,832 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> master.ServerManager: Registering 
> server=hadoop-slave2.hadoop,16020,1510342023184
> 2017-11-10 19:27:05,832 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> master.ServerManager: Triggering server recovery; existingServer 
> hadoop-slave2.hadoop,16020,1510341988652 looks stale, new 
> server:hadoop-slave2.hadoop,16020,1510342023184
> 2017-11-10 19:27:05,832 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> master.ServerManager: Master doesn't enable ServerShutdownHandler during 
> initialization, delay expiring server hadoop-slave2.hadoop,16020,1510341988652
> 2017-11-10 19:27:49,815 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> client.RpcRetryingCallerImpl: tarted=38594 ms ago, cancelled=false, 
> msg=org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not 
> online on hadoop-slave2.hadoop,16020,1510342023184
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3290)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1370)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2401)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41544)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
>  row 'hbase:namespace' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=hadoop-slave2.hadoop,16020,1510341988652, seqNum=0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to