[
https://issues.apache.org/jira/browse/HBASE-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379248#comment-14379248
]
Jerry He commented on HBASE-13317:
----------------------------------
I thought about how to do the unit test today. Need to use HBaseMiniCluser to
really test the scenario.
But need to intervene into the middle of the mini cluster so that I can
simulate the change of master.
> Region server reportForDuty stuck looping if there is a master change
> ---------------------------------------------------------------------
>
> Key: HBASE-13317
> URL: https://issues.apache.org/jira/browse/HBASE-13317
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 1.0.0, 2.0.0, 0.98.12
> Reporter: Jerry He
> Assignee: Jerry He
> Fix For: 2.0.0, 1.0.1, 0.98.13
>
> Attachments: HBASE-13317-0.98-v2.patch, HBASE-13317-0.98.patch
>
>
> During cluster startup, region server reportForDuty gets stuck looping if
> there is a master change.
> {noformat}
> 2015-03-22 11:15:16,186 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf274,60000,1427045883965 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:16,272 WARN [regionserver60020] regionserver.HRegionServer:
> error telling master we are up
> com.google.protobuf.ServiceException: java.net.ConnectException: Connection
> refused
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> at
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896)
> at java.lang.Thread.run(Thread.java:745)
> 2015-03-22 11:15:16,274 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> 2015-03-22 11:15:19,274 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:19,275 WARN [regionserver60020] regionserver.HRegionServer:
> error telling master we are up
> com.google.protobuf.ServiceException: java.net.ConnectException: Connection
> refused
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> at
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896)
> at java.lang.Thread.run(Thread.java:745)
> 2015-03-22 11:15:19,276 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> 2015-03-22 11:15:22,276 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:22,296 DEBUG [regionserver60020] regionserver.HRegionServer:
> Master is not running yet
> 2015-03-22 11:15:22,296 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> 2015-03-22 11:15:25,296 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:25,299 DEBUG [regionserver60020] regionserver.HRegionServer:
> Master is not running yet
> 2015-03-22 11:15:25,299 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> 2015-03-22 11:15:28,299 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:28,302 DEBUG [regionserver60020] regionserver.HRegionServer:
> Master is not running yet
> 2015-03-22 11:15:28,302 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> {noformat}
> What happended is the region server first got
> master=bigaperf274,60000,1427045883965. Before it was able to report
> successfully, the maser changed to bigaperf273,60000,1427048108439.
> We were supposed to open a new connection to the new master. But we never
> did, looping and trying to old address forever.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)