[
https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928117#action_12928117
]
stack commented on HBASE-3168:
------------------------------
@Jeff
#1 could be a legitimate problem in case where regionserver came up but there
was no master to connect too so regionserver just hung out twiddling its thumbs
for five or ten minutes.
#2 is not an issue. You say "If each region server then calls
reportsForDuty...". Thats not what happens. A regionserver when it comes up
calls reportForDuty/regionServerStartup. Thereafter, it heartbeats by calling
regionServerReport (until it dies). When a master joins an already running
cluster, the regionservers will just call the new masters' regionServerReport -
not the initializing regionServerStartup -- and the master just registers the
regionserver at that time (TODO: do away with regionServerStartup or when a new
master joins cluster, have regionserver call regionServerStartup rather than
regionServerReport. In interests of simplicity, it doesn't seem as though
regionServerStartup is no longer necessary so we should just axe it).
I like Jon's suggestion of changing the signature on reportsForDuty to add
regionServerCurrentTimeMillis param.
You might argue that regionServerReport should be modified too to also take the
regionserver timestamp but thats probably overdoing it.
Thanks for working on this.
> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
> Key: HBASE-3168
> URL: https://issues.apache.org/jira/browse/HBASE-3168
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 0.89.20100924
> Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
> Reporter: Jeff Whiting
> Fix For: 0.90.0
>
> Attachments: HBASE-3168-trunk-v1.txt
>
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock
> isn't too far out of skew with the rest of the cluster. If the RS's time is
> too far out of skew then the master would prevent it from joining and RS
> would die and log the error.
> Having a RS with even small differences in time can cause huge problems due
> to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing:
> {code}
> HServerInfo info = new HServerInfo(serverInfo);
> checkIsDead(info.getServerName(), "STARTUP");
> checkAlreadySameHostPort(info);
> recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.