[
https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773472#comment-16773472
]
Mingliang Liu commented on HBASE-21164:
---------------------------------------
Thanks for reporting this [~psomogyi]. This is interesting as I did not expect
{{TestMasterShutdown}} fails as it seems unrelated to this patch. I think this
patch is still applicable to {{branch-1}} and I appreciate the backport efforts.
I debugged a little bit, and found the problem is that, Master is not able to
stop as it times out to wait for the RegionServers to close. I then checked the
patch and find the following lines in {{branch-1}} which are not in the initial
patch for {{trunk}} and {{branch-2}}.
{code:title=HMaster.java}
@Override
public void stop(String msg) {
if (!isStopped()) {
super.stop(msg);
if (this.activeMasterManager != null) {
this.activeMasterManager.stop();
}
}
}
{code}
I deleted this section and the related tests pass locally. I attach the patch
for {{branch-1}} here for review and discussion. I think the code snippet above
was brought along with this patch but I don't have all the context. Maybe
[~apurtell] can explain that or it was unintentional.
Thanks,
> reportForDuty to spew less log if master is initializing
> --------------------------------------------------------
>
> Key: HBASE-21164
> URL: https://issues.apache.org/jira/browse/HBASE-21164
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: stack
> Assignee: Mingliang Liu
> Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch,
> HBASE-21164.007.patch, HBASE-21164.008.patch, HBASE-21164.009.patch,
> HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch,
> HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch
>
>
> RegionServers do reportForDuty on startup to tell Master they are available.
> If Master is initializing, and especially on a big cluster when it can take a
> while particularly if something is amiss, the log every three seconds is
> annoying and doesn't do anything of use. We should spew less those logs. Here
> is example:
> {code:java}
> 2018-09-06 14:01:39,312 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to
> master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001,
> startcode=1536266763109
> 2018-09-06 14:01:39,312 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed;
> sleeping and then retrying.
> ....
> {code}
> For example, I am looking at a large cluster now that had a backlog of
> procedure WALs. It is taking a couple of hours recreating the procedure-state
> because there are millions of procedures outstanding. Meantime, the Master
> log is just full of the above message – every three seconds...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)