[
https://issues.apache.org/jira/browse/HBASE-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Junhong Xu resolved HBASE-24548.
--------------------------------
Resolution: Not A Problem
> improvement for HBase RS Stop
> ------------------------------
>
> Key: HBASE-24548
> URL: https://issues.apache.org/jira/browse/HBASE-24548
> Project: HBase
> Issue Type: Improvement
> Reporter: Junhong Xu
> Assignee: Junhong Xu
> Priority: Major
>
> In our internal hbase based on branch-2.1 in community, we find after the
> regionserver is stopped about 30 s later, the master find it dead finally
> from its ephemeral node deleted in zk. During this time, the regions on this
> server is unavailable and no progress. The log is as follows:
> {code:java}
> [2020-06-12 15:51:41.888
> ActorThreadPool-consumer-processor-talos-set-alias-55-1 ERROR
> c.x.xmpush.hbase.utils.HBaseHelper] [get data hbase failed, tableName =
> mipush:app_alias_new]
> com.xiaomi.infra.hbase.client.HException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=10, exceptions:
> Fri Jun 12 15:50:44 CST 2020,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865,
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server
> c3-hadoop-srv-st639.bj,13700,1591932264018 stopping
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> Fri Jun 12 15:50:44 CST 2020,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865,
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server
> c3-hadoop-srv-st639.bj,13700,1591932264018 stopping
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> The logs in master:
> {code:java}
> 2020-06-12,15:51:12,003 INFO [RegionServerTracker-0]
> org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral
> node deleted, processing expiration
> [c3-hadoop-srv-st639.bj,13700,1591932264018]
> 2020-06-12,15:51:12,003 INFO [RegionServerTracker-0]
> org.apache.hadoop.hbase.master.ServerManager: Processing expiration of
> c3-hadoop-srv-st639.bj,13700,1591932264018 on
> c3-hadoop-miui-zk05.bj,13600,1591927126881
> 2020-06-12,15:51:12,109 INFO [RegionServerTracker-0]
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: Added
> c3-hadoop-srv-st639.bj,13700,1591932264018 to dead servers which
> carryingMeta=false, submitted ServerCrashProcedure pid=97428
> 2020-06-12,15:51:12,109 INFO
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-c3-hadoop-miui-zk05.bj,13600,1591927126881]
>
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread:
> Updating default servers.
> 2020-06-12,15:51:12,111 INFO [PEWorker-11]
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start
> pid=97428, state=RUNNABLE:SERVER_CRASH_START, locked=true;
> ServerCrashProcedure server=c3-hadoop-srv-st639.bj,13700,1591932264018,
> splitWal=true, meta=false
> {code}
> After discussion with [~zghao] offline, we could accelerate this process by
> sending the message to the master or deleting the ephemeral node itself
> before stop.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)