[ https://issues.apache.org/jira/browse/HBASE-23597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000646#comment-17000646 ]
Lijin Bin commented on HBASE-23597: ----------------------------------- [~stack] It doesn't matter. Also is there any thing that we can do to help you for you problem? > Give high priority for meta assign procedure and ServerCrashProcedure which > carry meta. > --------------------------------------------------------------------------------------- > > Key: HBASE-23597 > URL: https://issues.apache.org/jira/browse/HBASE-23597 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.2.2 > Reporter: Lijin Bin > Priority: Major > > {code} > 2019-12-18 16:14:41,698 INFO [KeepAlivePEWorker-18] > assignment.TransitRegionStateProcedure: Starting pid=23568, ppid=23567, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; > openRegion rit=OPEN, location=100.107.165.61,60020,1576553057082; > loc=100.107.165.61,60020,1576553057082 > 2019-12-18 16:14:41,698 INFO [KeepAlivePEWorker-18] > zookeeper.MetaTableLocator: Setting hbase:meta (replicaId=0) location in > ZooKeeper as 100.107.165.61,60020,1576553057082 > 2019-12-18 16:14:43,515 INFO [PEWorker-2] procedure2.ProcedureExecutor: > Finished subprocedure pid=23569, resume processing parent pid=23568, > ppid=23567, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, > locked=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, > ASSIGN > 2019-12-18 16:14:43,518 INFO [PEWorker-2] procedure2.ProcedureExecutor: > Finished pid=23569, ppid=23568, state=SUCCESS; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.5970sec > 2019-12-18 16:14:43,522 INFO [PEWorker-9] procedure2.ProcedureExecutor: > Finished subprocedure pid=23568, resume processing parent pid=23567, > state=RUNNABLE:SERVER_CRASH_GET_REGIONS, locked=true; ServerCrashProcedure > server=100.107.165.22,60020,1576553019781, splitWal=true, meta=true > 2019-12-18 16:14:43,522 INFO [PEWorker-9] procedure2.ProcedureExecutor: > Finished pid=23568, ppid=23567, state=SUCCESS; TransitRegionStateProcedure > table=hbase:meta, region=1588230740, ASSIGN in 6.4630sec > 2019-12-18 16:15:07,212 DEBUG [RegionServerTracker-0] > procedure2.ProcedureExecutor: Stored pid=69619, > state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true > 2019-12-18 16:15:07,212 DEBUG [RegionServerTracker-0] > procedure.MasterProcedureScheduler: Add > ServerQueue(100.107.165.61,60020,1576553057082, xlock=false sharedLock=0 > size=1) to run queue because: the exclusive lock is not held by anyone when > adding pid=69619, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true > 2019-12-18 16:15:07,212 INFO [RegionServerTracker-0] > assignment.AssignmentManager: Scheduled SCP pid=69619 for > 100.107.165.61,60020,1576553057082 (carryingMeta=true) > 100.107.165.61,60020,1576553057082/CRASHED/regionCount=13026/lock=java.util.concurrent.locks.ReentrantReadWriteLock@68f2ee72[Write > locks = 1, Read locks = 0], oldState=ONLINE. > 2019-12-18 16:15:21,629 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=959,queue=191,port=60000] > master.DeadServer: Removed 100.107.165.61,60020,1576553057082, > processing=true, numProcessing=0 > 2019-12-18 16:16:14,779 DEBUG [qtp1688526221-1038] > client.ConnectionImplementation: locateRegionInMeta parentTable='hbase:meta', > attempt=0 of 31 failed; retrying after sleep of 31 > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=31, exceptions: > Wed Dec 18 16:16:14 CST 2019, null, java.net.SocketTimeoutException: > callTimeout=60000, callDuration=81362: > org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not > online on 100.107.165.61,60020,1576656916048 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2967) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3300) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42190) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > row 'hbase:rsgroup,,99999999999999' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=100.107.165.61,60020,1576553057082, > seqNum=-1 > 2019-12-18 16:33:04,715 ERROR [PEWorker-15] master.HMaster: Master server > abort: loaded coprocessors are: > [org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint] > 2019-12-18 16:33:04,715 ERROR [PEWorker-15] master.HMaster: ***** ABORTING > master 100.107.176.225,60000,1576656778460: FAILED persisting > region=38d18fd824890c80cff972cbf2e4c174 state=OPENING ***** > java.net.SocketTimeoutException: callTimeout=1200000, callDuration=1286005: > org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not > online on 100.107.165.61,60020,1576656916048 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > row 'table1w,user22322,1575810347321.38d18fd824890c80cff972cbf2e4c174.' on > table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=100.107.165.61,60020,1576553057082, seqNum=-1 > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:540) > at > org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:209) > at > org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:203) > at > org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:141) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.transitStateAndUpdate(AssignmentManager.java:1632) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionOpening(AssignmentManager.java:1647) > at > org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.openRegion(TransitRegionStateProcedure.java:207) > at > org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:339) > at > org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:102) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194) > at > org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:319) > at > org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:102) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965) > Caused by: org.apache.hadoop.hbase.NotServingRegionException: > org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not > online on 100.107.165.61,60020,1576656916048 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:99) > at > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:89) > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:282) > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:269) > at > org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:129) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) > ... 17 more > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException): > org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not > online on 100.107.165.61,60020,1576656916048 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:389) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:97) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:423) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419) > at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103) > at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118) > at > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:162) > at > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:192) > at > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) > 2019-12-18 16:33:04,728 DEBUG [PEWorker-15] > procedure.MasterProcedureScheduler: Remove > ServerQueue(100.107.165.61,60020,1576553057082, xlock=false sharedLock=0 > size=0) from run queue because: queue is empty after polling out pid=69619, > state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true > 2019-12-18 16:33:04,728 DEBUG [PEWorker-15] > procedure.MasterProcedureScheduler: Remove > ServerQueue(100.107.165.61,60020,1576553057082, xlock=true (69619) > sharedLock=0 size=0) from run queue because: pid=69619, > state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true held > exclusive lock > 2019-12-18 16:33:06,353 INFO [PEWorker-13] > assignment.TransitRegionStateProcedure: Starting pid=69626, ppid=69619, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; > openRegion rit=OPEN, location=100.107.165.22,60020,1576656877749; > loc=100.107.165.22,60020,1576656877749 > 2019-12-18 16:33:06,353 INFO [PEWorker-13] zookeeper.MetaTableLocator: > Setting hbase:meta (replicaId=0) location in ZooKeeper as > 100.107.165.22,60020,1576656877749 > {code} > Looks at this logs, we can see the ServerCrashProcedure > server=100.107.165.61,60020,1576553057082(carry meta) queued for quit a while > blocked by other procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)