[
https://issues.apache.org/jira/browse/HBASE-20644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-20644:
---------------------------
Description:
>From hbase-hbase-master-ctr-e138-1518143905142-329221-01-000003.hwx.site.log :
{code}
2018-05-23 22:14:29,750 ERROR
[master/ctr-e138-1518143905142-329221-01-000003:20000] master.HMaster: Failed
to become active master
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl
[FAILED] to be RUNNING, but the service has FAILED
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
at
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1054)
at
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:918)
at
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2023)
{code}
Earlier in the log , the namespace region, 01a7f9ba9fffd691f261d3fbc620da06 ,
was deemed OPEN on 01-000007.hwx.site,16020,1527112194788 which was declared
not online:
{code}
2018-05-23 21:54:34,786 INFO
[master/ctr-e138-1518143905142-329221-01-000003:20000]
assignment.RegionStateStore: Load hbase:meta entry
region=01a7f9ba9fffd691f261d3fbc620da06, regionState=OPEN,
lastHost=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112194788,
regionLocation=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112194788,
seqnum=43
2018-05-23 21:54:34,787 INFO
[master/ctr-e138-1518143905142-329221-01-000003:20000]
assignment.AssignmentManager: Number of RegionServers=1
2018-05-23 21:54:34,788 INFO
[master/ctr-e138-1518143905142-329221-01-000003:20000]
assignment.AssignmentManager: KILL
RegionServer=ctr-e138-1518143905142-329221-01-000007.
hwx.site,16020,1527112194788 hosting regions but not online.
{code}
Later, even though a different instance on 007 registered with master:
{code}
2018-05-23 21:55:13,541 INFO
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000]
master.ServerManager: Registering
regionserver=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112506002
...
2018-05-23 21:55:43,881 INFO
[master/ctr-e138-1518143905142-329221-01-000003:20000]
client.RpcRetryingCallerImpl: Call exception, tries=12, retries=12,
started=69001 ms ago, cancelled=false,
msg=org.apache.hadoop.hbase.NotServingRegionException:
hbase:namespace,,1527099443383.01a7f9ba9fffd691f261d3fbc620da06. is not online
on ctr-e138-1518143905142-329221- 01-000007.hwx.site,16020,1527112506002
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3273)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3250)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2446)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
{code}
There was no OPEN request for 01a7f9ba9fffd691f261d3fbc620da06 sent to that
server instance.
>From
>hbase-hbase-regionserver-ctr-e138-1518143905142-329221-01-000007.hwx.site.log :
{code}
2018-05-23 21:52:27,414 INFO
[RS_CLOSE_REGION-regionserver/ctr-e138-1518143905142-329221-01-000007:16020-1]
regionserver.HRegion: Closed hbase:namespace,,1527099443383.
01a7f9ba9fffd691f261d3fbc620da06.
{code}
Then region server 007 restarted:
{code}
Wed May 23 21:55:03 UTC 2018 Starting regionserver on
ctr-e138-1518143905142-329221-01-000007.hwx.site
{code}
After which the region 01a7f9ba9fffd691f261d3fbc620da06 never showed up again
in log 007
was:
>From hbase-hbase-master-ctr-e138-1518143905142-329221-01-000003.hwx.site.log :
{code}
2018-05-23 22:14:29,750 ERROR
[master/ctr-e138-1518143905142-329221-01-000003:20000] master.HMaster: Failed
to become active master
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl
[FAILED] to be RUNNING, but the service has FAILED
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
at
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1054)
at
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:918)
at
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2023)
{code}
Earlier in the log , the namespace region was deemed OPEN on
01-000007.hwx.site,16020,1527112194788 which was declared not online:
{code}
2018-05-23 21:54:34,786 INFO
[master/ctr-e138-1518143905142-329221-01-000003:20000]
assignment.RegionStateStore: Load hbase:meta entry
region=01a7f9ba9fffd691f261d3fbc620da06, regionState=OPEN,
lastHost=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112194788,
regionLocation=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112194788,
seqnum=43
2018-05-23 21:54:34,787 INFO
[master/ctr-e138-1518143905142-329221-01-000003:20000]
assignment.AssignmentManager: Number of RegionServers=1
2018-05-23 21:54:34,788 INFO
[master/ctr-e138-1518143905142-329221-01-000003:20000]
assignment.AssignmentManager: KILL
RegionServer=ctr-e138-1518143905142-329221-01-000007.
hwx.site,16020,1527112194788 hosting regions but not online.
{code}
Later, even though a different instance on 007 registered with master:
{code}
2018-05-23 21:55:13,541 INFO
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000]
master.ServerManager: Registering
regionserver=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112506002
...
2018-05-23 21:55:43,881 INFO
[master/ctr-e138-1518143905142-329221-01-000003:20000]
client.RpcRetryingCallerImpl: Call exception, tries=12, retries=12,
started=69001 ms ago, cancelled=false,
msg=org.apache.hadoop.hbase.NotServingRegionException:
hbase:namespace,,1527099443383.01a7f9ba9fffd691f261d3fbc620da06. is not online
on ctr-e138-1518143905142-329221- 01-000007.hwx.site,16020,1527112506002
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3273)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3250)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2446)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
{code}
There was no OPEN request sent to that instance.
>From
>hbase-hbase-regionserver-ctr-e138-1518143905142-329221-01-000007.hwx.site.log :
{code}
2018-05-23 21:52:27,414 INFO
[RS_CLOSE_REGION-regionserver/ctr-e138-1518143905142-329221-01-000007:16020-1]
regionserver.HRegion: Closed hbase:namespace,,1527099443383.
01a7f9ba9fffd691f261d3fbc620da06.
{code}
Then region server 007 restarted:
{code}
Wed May 23 21:55:03 UTC 2018 Starting regionserver on
ctr-e138-1518143905142-329221-01-000007.hwx.site
{code}
After which the region 01a7f9ba9fffd691f261d3fbc620da06 never showed up again
in log 007
> Master shutdown due to service ClusterSchemaServiceImpl failing to start
> ------------------------------------------------------------------------
>
> Key: HBASE-20644
> URL: https://issues.apache.org/jira/browse/HBASE-20644
> Project: HBase
> Issue Type: Bug
> Reporter: Romil Choksi
> Priority: Major
>
> From hbase-hbase-master-ctr-e138-1518143905142-329221-01-000003.hwx.site.log :
> {code}
> 2018-05-23 22:14:29,750 ERROR
> [master/ctr-e138-1518143905142-329221-01-000003:20000] master.HMaster: Failed
> to become active master
> java.lang.IllegalStateException: Expected the service
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
> at
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> at
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> at
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1054)
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:918)
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2023)
> {code}
> Earlier in the log , the namespace region, 01a7f9ba9fffd691f261d3fbc620da06 ,
> was deemed OPEN on 01-000007.hwx.site,16020,1527112194788 which was declared
> not online:
> {code}
> 2018-05-23 21:54:34,786 INFO
> [master/ctr-e138-1518143905142-329221-01-000003:20000]
> assignment.RegionStateStore: Load hbase:meta entry
> region=01a7f9ba9fffd691f261d3fbc620da06, regionState=OPEN,
> lastHost=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112194788,
>
> regionLocation=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112194788,
> seqnum=43
> 2018-05-23 21:54:34,787 INFO
> [master/ctr-e138-1518143905142-329221-01-000003:20000]
> assignment.AssignmentManager: Number of RegionServers=1
> 2018-05-23 21:54:34,788 INFO
> [master/ctr-e138-1518143905142-329221-01-000003:20000]
> assignment.AssignmentManager: KILL
> RegionServer=ctr-e138-1518143905142-329221-01-000007.
> hwx.site,16020,1527112194788 hosting regions but not online.
> {code}
> Later, even though a different instance on 007 registered with master:
> {code}
> 2018-05-23 21:55:13,541 INFO
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000]
> master.ServerManager: Registering
> regionserver=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112506002
> ...
> 2018-05-23 21:55:43,881 INFO
> [master/ctr-e138-1518143905142-329221-01-000003:20000]
> client.RpcRetryingCallerImpl: Call exception, tries=12, retries=12,
> started=69001 ms ago, cancelled=false,
> msg=org.apache.hadoop.hbase.NotServingRegionException:
> hbase:namespace,,1527099443383.01a7f9ba9fffd691f261d3fbc620da06. is not
> online on ctr-e138-1518143905142-329221-
> 01-000007.hwx.site,16020,1527112506002
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3273)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3250)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2446)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> {code}
> There was no OPEN request for 01a7f9ba9fffd691f261d3fbc620da06 sent to that
> server instance.
> From
> hbase-hbase-regionserver-ctr-e138-1518143905142-329221-01-000007.hwx.site.log
> :
> {code}
> 2018-05-23 21:52:27,414 INFO
> [RS_CLOSE_REGION-regionserver/ctr-e138-1518143905142-329221-01-000007:16020-1]
> regionserver.HRegion: Closed hbase:namespace,,1527099443383.
> 01a7f9ba9fffd691f261d3fbc620da06.
> {code}
> Then region server 007 restarted:
> {code}
> Wed May 23 21:55:03 UTC 2018 Starting regionserver on
> ctr-e138-1518143905142-329221-01-000007.hwx.site
> {code}
> After which the region 01a7f9ba9fffd691f261d3fbc620da06 never showed up again
> in log 007
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)