[ 
https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Whiting updated HBASE-3168:
--------------------------------

    Attachment: HBASE-3168-trunk-v2.txt

Here is the latest patch with the new parameter "serverCurrentTime" added to 
regionServerStartup.  This patch was made against trunk.

A summary of some of the changes:
-Added "serverCurrentTime" to regionServerStartup
-I had to increment the HBaseRPCProtocolVersion because of the new parameter.  
I'm unsure if this is a "big deal".
-Added a new configuration parameter: hbase.master.regionserver.maxClockSkewMS 
and defaulted it to 30 sec (this may be too large).
-Added new exception: ClockOutOfSyncException
-Added TestClockSkewDetection to test the implementation
  

> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
>                 Key: HBASE-3168
>                 URL: https://issues.apache.org/jira/browse/HBASE-3168
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.89.20100924
>         Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
>            Reporter: Jeff Whiting
>            Assignee: Jeff Whiting
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3168-trunk-v1.txt, HBASE-3168-trunk-v2.txt
>
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock 
> isn't too far out of skew with the rest of the cluster.  If the RS's time is 
> too far out of skew then the master would prevent it from joining and RS 
> would die and log the error. 
> Having a RS with even small differences in time can cause huge problems due 
> to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing: 
> {code}
>     HServerInfo info = new HServerInfo(serverInfo);
>     checkIsDead(info.getServerName(), "STARTUP");
>     checkAlreadySameHostPort(info);
>     recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to