[
https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927001#action_12927001
]
Jonathan Gray commented on HBASE-3168:
--------------------------------------
I'm not following your logic, Michael. You are saying that because it's
possible to have your cluster with time properly synchronized, there is no need
to detect that it is as such?
bq. When you start a Unix Server, you should have it configured to check with
an NTP server and the cluster should all point to the same NTP server.
Yes, you _should_ do those things. But sometimes it's not done and sometimes
you think you are doing it but there's a problem anyways.
The question is not whether it's possible to sync times (for the most part, it
is possible). This jira is about detecting clock skew at startup, so that if
clocks are out of sync, rather than just letting everything go, we'd throw
warnings and prevent an out of sync RS from joining the cluster.
In a recent case here, four servers for some reason came up with a weird ntp
cmd line option that was preventing them from properly syncing. Whether
something else detects that or not, I think it makes sense that HBase does a
sanity check.
This is in the same way that when using LZO on your HBase cluster, you
certainly _should_ have LZO properly installed everywhere. But that's not
always the case and it's a good idea to detect it rather than wait until it
bites you later.
> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
> Key: HBASE-3168
> URL: https://issues.apache.org/jira/browse/HBASE-3168
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 0.89.20100924
> Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
> Reporter: Jeff Whiting
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock
> isn't too far out of skew with the rest of the cluster. If the RS's time is
> too far out of skew then the master would prevent it from joining and RS
> would die and log the error.
> Having a RS with even small differences in time can cause huge problems due
> to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing:
> {code}
> HServerInfo info = new HServerInfo(serverInfo);
> checkIsDead(info.getServerName(), "STARTUP");
> checkAlreadySameHostPort(info);
> recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.