[ 
https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773472#comment-16773472
 ] 

Mingliang Liu commented on HBASE-21164:
---------------------------------------

Thanks for reporting this [~psomogyi]. This is interesting as I did not expect 
{{TestMasterShutdown}} fails as it seems unrelated to this patch. I think this 
patch is still applicable to {{branch-1}} and I appreciate the backport efforts.

I debugged a little bit, and found the problem is that, Master is not able to 
stop as it times out to wait for the RegionServers to close. I then checked the 
patch and find the following lines in {{branch-1}} which are not in the initial 
patch for {{trunk}} and {{branch-2}}.

{code:title=HMaster.java}
  @Override
  public void stop(String msg) {
    if (!isStopped()) {
      super.stop(msg);
      if (this.activeMasterManager != null) {
        this.activeMasterManager.stop();
      }
    }
  }
{code}

I deleted this section and the related tests pass locally. I attach the patch 
for {{branch-1}} here for review and discussion. I think the code snippet above 
was brought along with this patch but I don't have all the context. Maybe 
[~apurtell] can explain that or it was unintentional.

Thanks,

> reportForDuty to spew less log if master is initializing
> --------------------------------------------------------
>
>                 Key: HBASE-21164
>                 URL: https://issues.apache.org/jira/browse/HBASE-21164
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: stack
>            Assignee: Mingliang Liu
>            Priority: Minor
>             Fix For: 3.0.0, 2.2.0, 2.1.1
>
>         Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, 
> HBASE-21164.007.patch, HBASE-21164.008.patch, HBASE-21164.009.patch, 
> HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, 
> HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch
>
>
> RegionServers do reportForDuty on startup to tell Master they are available. 
> If Master is initializing, and especially on a big cluster when it can take a 
> while particularly if something is amiss, the log every three seconds is 
> annoying and doesn't do anything of use. We should spew less those logs. Here 
> is example:
> {code:java}
> 2018-09-06 14:01:39,312 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to 
> master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, 
> startcode=1536266763109
> 2018-09-06 14:01:39,312 WARN 
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; 
> sleeping and then retrying.
> ....
> {code}
> For example, I am looking at a large cluster now that had a backlog of 
> procedure WALs. It is taking a couple of hours recreating the procedure-state 
> because there are millions of procedures outstanding. Meantime, the Master 
> log is just full of the above message – every three seconds...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to