[
https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Gray updated HBASE-3168:
---------------------------------
Attachment: HBASE-3168-v5.patch
Final patch from RB. One last change which is the HRS would actually just loop
trying to keep checking in to master if was out of sync. Now we catch
RemoteException and if it's clock skew, we rethrow exception which aborts RS.
> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
> Key: HBASE-3168
> URL: https://issues.apache.org/jira/browse/HBASE-3168
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 0.89.20100924
> Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
> Reporter: Jeff Whiting
> Assignee: Jeff Whiting
> Fix For: 0.90.0
>
> Attachments: HBASE-3168-trunk-v1.txt, HBASE-3168-trunk-v2.txt,
> HBASE-3168-trunk-v3.txt, HBASE-3168-v4.patch, HBASE-3168-v5.patch
>
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock
> isn't too far out of skew with the rest of the cluster. If the RS's time is
> too far out of skew then the master would prevent it from joining and RS
> would die and log the error.
> Having a RS with even small differences in time can cause huge problems due
> to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing:
> {code}
> HServerInfo info = new HServerInfo(serverInfo);
> checkIsDead(info.getServerName(), "STARTUP");
> checkAlreadySameHostPort(info);
> recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.