[ 
https://issues.apache.org/jira/browse/HBASE-27983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850897#comment-17850897
 ] 

JianDa Gao commented on HBASE-27983:
------------------------------------

Will restarting the master cause the regionserver to crash?

> The RSGroupAdminEndpoint is causing the hbase:meta region to be unable to 
> come online.
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-27983
>                 URL: https://issues.apache.org/jira/browse/HBASE-27983
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>    Affects Versions: 2.1.1
>         Environment: *Hardware:*
> Red Hat Enterprise Linux Server release 7.9 (Maipo)
> HDD 12 * 50G
> 16 cores
>  
> *Software:*
> HBase Version : 2.1.1
> Hadoop Version : hadoop-2.7.2
> Zookeeper Version : 3.5.7
>  
> *Roles:*
> HBase is configured to use a master high availability (HA) mode with two 
> masters and three regionservers.
> ||host||role||
> |ysl102-qax.com|master regionserver|
> |ysl103-qax.com|master regionserver|
> |ysl104-qax.com|regionserver |
>  
>            Reporter: JianDa Gao
>            Priority: Major
>         Attachments: hbase-hbase-master-YSL104-QAX.COM.log
>
>
> When I use RSGroupAdminEndpoint and restart both master before restarting the 
> regionserver, I encounter a syscall:getsockopt(..) issue that prevents the 
> hbase:meta region from coming online, resulting in a service exception.
> {code:java}
> 2023-07-10 16:32:22,282 INFO  
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162]
>  rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread: Updating default 
> servers.
> 2023-07-10 16:32:22,299 INFO  [PEWorker-8] procedure.ServerCrashProcedure: 
> Start pid=2, state=RUNNABLE:SERVER_CRASH_START, locked=true; 
> ServerCrashProcedure server=ysl102-qax.com,16020,1688977249460, 
> splitWal=true, meta=false
> 2023-07-10 16:32:22,400 INFO  [PEWorker-10] master.SplitLogManager: 
> hdfs://HACluster/home/hbase/WALs/ysl102-qax.com,16020,1688977249460-splitting 
> dir is empty, no logs to split.
> 2023-07-10 16:32:22,411 INFO  [PEWorker-10] master.SplitLogManager: Finished 
> splitting (more than or equal to) 0 bytes in 0 log files in 
> [hdfs://HACluster/home/hbase/WALs/ysl102-qax.com,16020,1688977249460-splitting]
>  in 0ms
> 2023-07-10 16:32:22,521 INFO  [PEWorker-10] procedure2.ProcedureExecutor: 
> Finished pid=2, state=SUCCESS; ServerCrashProcedure 
> server=ysl102-qax.com,16020,1688977249460, splitWal=true, meta=false in 
> 326msec
> 2023-07-10 16:32:22,941 INFO  [RegionServerTracker-0] 
> master.RegionServerTracker: RegionServer ephemeral node deleted, processing 
> expiration [ysl104-qax.com,16020,1688977251592]
> 2023-07-10 16:32:22,941 INFO  [RegionServerTracker-0] master.ServerManager: 
> Processing expiration of ysl104-qax.com,16020,1688977251592 on 
> ysl104-qax.com,16000,1688977910162
> 2023-07-10 16:32:23,069 INFO  [PEWorker-12] procedure.ServerCrashProcedure: 
> Start pid=3, state=RUNNABLE:SERVER_CRASH_START, locked=true; 
> ServerCrashProcedure server=ysl104-qax.com,16020,1688977251592, 
> splitWal=true, meta=true
> 2023-07-10 16:32:23,126 INFO  [PEWorker-12] master.SplitLogManager: 
> hdfs://HACluster/home/hbase/WALs/ysl104-qax.com,16020,1688977251592-splitting 
> dir is empty, no logs to split.
> 2023-07-10 16:32:23,135 INFO  [PEWorker-12] master.SplitLogManager: Finished 
> splitting (more than or equal to) 0 bytes in 0 log files in 
> [hdfs://HACluster/home/hbase/WALs/ysl104-qax.com,16020,1688977251592-splitting]
>  in 0ms
> 2023-07-10 16:32:23,174 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
> Initialized subprocedures=[{pid=4, ppid=3, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
> region=1588230740}]
> 2023-07-10 16:32:23,206 INFO  [PEWorker-15] 
> procedure.MasterProcedureScheduler: Took xlock for pid=4, ppid=3, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
> region=1588230740
> 2023-07-10 16:32:23,325 INFO  [PEWorker-15] assignment.AssignProcedure: 
> Starting pid=4, ppid=3, state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; 
> AssignProcedure table=hbase:meta, region=1588230740; rit=OFFLINE, 
> location=ysl104-qax.com,16020,1688977251592; forceNewPlan=false, retain=true
> 2023-07-10 16:32:23,476 WARN  [master/YSL104-QAX:16000] 
> assignment.AssignmentManager: No servers available; cannot place 1 unassigned 
> regions.
> 2023-07-10 16:32:24,477 WARN  [master/YSL104-QAX:16000] 
> assignment.AssignmentManager: No servers available; cannot place 1 unassigned 
> regions.
> 2023-07-10 16:32:25,478 WARN  [master/YSL104-QAX:16000] 
> assignment.AssignmentManager: No servers available; cannot place 1 unassigned 
> regions.
> 2023-07-10 16:32:26,479 WARN  [master/YSL104-QAX:16000] 
> assignment.AssignmentManager: No servers available; cannot place 1 unassigned 
> regions.
> 2023-07-10 16:32:26,665 INFO  
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162]
>  client.RpcRetryingCallerImpl: Call exception, tries=6, retries=46, 
> started=4175 ms ago, cancelled=false, msg=Call to 
> YSL104-QAX.COM/10.59.12.104:16020 failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  syscall:getsockopt(..) failed: Connection refused: 
> YSL104-QAX.COM/xx.xx.xx.104:16020, details=row 'hbase:rsgroup' on table 
> 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=ysl104-qax.com,16020,1688977251592, seqNum=-1
> 2023-07-10 16:32:27,480 WARN  [master/YSL104-QAX:16000] 
> assignment.AssignmentManager: No servers available; cannot place 1 unassigned 
> regions.
> 2023-07-10 16:32:28,481 WARN  [master/YSL104-QAX:16000] 
> assignment.AssignmentManager: No servers available; cannot place 1 unassigned 
> regions.
> 2023-07-10 16:32:29,482 WARN  [master/YSL104-QAX:16000] 
> assignment.AssignmentManager: No servers available; cannot place 1 unassigned 
> regions.
> 2023-07-10 16:32:30,483 WARN  [master/YSL104-QAX:16000] 
> assignment.AssignmentManager: No servers available; cannot place 1 unassigned 
> regions.
> 2023-07-10 16:32:30,899 INFO  
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162]
>  client.RpcRetryingCallerImpl: Call exception, tries=7, retries=46, 
> started=8409 ms ago, cancelled=false, msg=Connection closed, details=row 
> 'hbase:rsgroup' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=ysl104-qax.com,16020,1688977251592, seqNum=-1
> 2023-07-10 16:32:31,025 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=198,queue=18,port=16000] 
> master.ServerManager: Registering 
> regionserver=ysl103-qax.com,16020,1688977946684
> 2023-07-10 16:32:31,064 INFO  [RegionServerTracker-0] 
> master.RegionServerTracker: RegionServer ephemeral node created, adding 
> [ysl103-qax.com,16020,1688977946684]
> 2023-07-10 16:32:31,439 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=198,queue=18,port=16000] 
> master.ServerManager: Registering 
> regionserver=ysl102-qax.com,16020,1688977947399
> 2023-07-10 16:32:31,467 INFO  [RegionServerTracker-0] 
> master.RegionServerTracker: RegionServer ephemeral node created, adding 
> [ysl102-qax.com,16020,1688977947399]
> 2023-07-10 16:32:32,934 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=198,queue=18,port=16000] 
> master.ServerManager: Registering 
> regionserver=ysl104-qax.com,16020,1688977948804
> 2023-07-10 16:32:32,965 INFO  [RegionServerTracker-0] 
> master.RegionServerTracker: RegionServer ephemeral node created, adding 
> [ysl104-qax.com,16020,1688977948804]
> 2023-07-10 16:32:41,041 INFO  
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162]
>  client.RpcRetryingCallerImpl: ption: hbase:meta,,1 is not online on 
> ysl104-qax.com,16020,1688977948804
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3316)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3293)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1431)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2449)
>     at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> , details=row 'hbase:rsgroup' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=ysl104-qax.com,16020,1688977251592, 
> seqNum=-1
> 2023-07-10 16:32:51,123 INFO  
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162]
>  client.RpcRetryingCallerImpl: ption: hbase:meta,,1 is not online on 
> ysl104-qax.com,16020,1688977948804
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3316)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3293)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1431)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2449)
>     at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> , details=row 'hbase:rsgroup' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=ysl104-qax.com,16020,1688977251592, 
> seqNum=-1
> 2023-07-10 16:33:01,212 INFO  
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-ysl104-qax.com,16000,1688977910162]
>  client.RpcRetryingCallerImpl: ption: hbase:meta,,1 is not online on 
> ysl104-qax.com,16020,1688977948804
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3316)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3293)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1431)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2449)
>     at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> , details=row 'hbase:rsgroup' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=ysl104-qax.com,16020,1688977251592, 
> seqNum=-1 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to