[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611556#comment-16611556 ]
Allan Yang commented on HBASE-21164: ------------------------------------ {code} while (keepLooping()) { RegionServerStartupResponse w = reportForDuty(); if (w == null) { - LOG.warn("reportForDuty failed; sleeping and then retrying."); - this.sleeper.sleep(); + long sleepTime = rc.getBackoffTimeAndIncrementAttempts(); + LOG.warn("reportForDuty failed; sleeping {} ms and then retrying.", sleepTime); + this.sleeper.sleep(sleepTime); } else { {code} I don't think backing off here is a good idea. If sleeping time is too long, we will wait a lot of time waiting the regionserver to stop when shut down. Another opinion is that I think System.currentTimeMillis() is good enough, we don't need to be so acute when sleeping. > reportForDuty should do (expotential) backoff rather than retry every 3 > seconds (default). > ------------------------------------------------------------------------------------------ > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: stack > Assignee: Mingliang Liu > Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.branch-2.1.001.patch, > HBASE-21164.branch-2.1.002.patch, HBASE-21164.branch-2.1.003.patch, > HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > .... > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)