[
https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611622#comment-16611622
]
Mingliang Liu commented on HBASE-21164:
---------------------------------------
Thanks for the comment, [~allan163] and [~stack]. Learning from the discussion.
{quote}
I think System.currentTimeMillis() is good enough, we don't need to be so acute
when sleeping.
{quote}
The motivation is to avoid the effect of system time changes on elapsed time
calculations. The system time change can be due to users changing the time
settings, and/or internet time sync. I agree that suffering from the system
time change does not have critical consequence while it is better if we can
avoid that. We have similar effort in Hadoop see [HDFS-6841].
One side effect I can "imagine" if system time changes in our case is spurious
warning as following in Sleeper.
{code:java}
(slept - this.period > MINIMAL_DELTA_FOR_LOGGING) {
LOG.warn("We slept " + slept + "ms instead of " + this.period +
"ms, this is likely due to a long " +
"garbage collecting pause and it's usually bad, see " +
"http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired");
}
{code}
{quote}
There is a facility to wake this.sleeper. Could call from stop/abort?
{quote}
Can do that. As long as it's after {{this.stopped = true;}}, sleeper should
respect that.
{quote}
Could just have a max of a minute or two.
{quote}
One minute seems an easier pill to swallow here?
> reportForDuty should do (expotential) backoff rather than retry every 3
> seconds (default).
> ------------------------------------------------------------------------------------------
>
> Key: HBASE-21164
> URL: https://issues.apache.org/jira/browse/HBASE-21164
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: stack
> Assignee: Mingliang Liu
> Priority: Minor
> Attachments: HBASE-21164.005.patch, HBASE-21164.branch-2.1.001.patch,
> HBASE-21164.branch-2.1.002.patch, HBASE-21164.branch-2.1.003.patch,
> HBASE-21164.branch-2.1.004.patch
>
>
> RegionServers do reportForDuty on startup to tell Master they are available.
> If Master is initializing, and especially on a big cluster when it can take a
> while particularly if something is amiss, the log every three seconds is
> annoying and doesn't do anything of use. Do backoff if fails up to a
> reasonable maximum period. Here is example:
> {code}
> 2018-09-06 14:01:39,312 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to
> master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001,
> startcode=1536266763109
> 2018-09-06 14:01:39,312 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed;
> sleeping and then retrying.
> ....
> {code}
> For example, I am looking at a large cluster now that had a backlog of
> procedure WALs. It is taking a couple of hours recreating the procedure-state
> because there are millions of procedures outstanding. Meantime, the Master
> log is just full of the above message -- every three seconds...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)