[ 
https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929667#action_12929667
 ] 

Jonathan Gray commented on HBASE-3168:
--------------------------------------

Looks pretty good, Jeff.  I have a few comments.  If you want, put your diff up 
on review board (http://review.cloudera.org) but will give you my comments here 
for now:

- The unit test does all the work of starting/stopping an entire cluster, but 
then you just go behind the scenes and use ServerManager directly.  You should 
skip the cluster stuff and just mock up a ServerManager if that's the only 
thing we need to test.  (but I do appreciate a unit test!)  :)

- There looks to be a few lines that are > 80 characters (like method 
signatures in HMasterRegionInterface, HMaster, and elsewhere.  some stuff in 
checkClockSkew)

- In HMaster, just remove the comment (and the @see) and instead use 
@Override... this will take care of javadoc and is best form i think

- It's not ideal to have to pull from Configuration object on each call to the 
check for skew, but this is fairly rare so not a big deal.

I can fix most of this stuff on commit.  Can you just address the unit test?  
Let's make it as simple/fast as possible, I think it's fine to just have a 
server-side unit test (ServerManager level) rather than a full cluster test.

> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
>                 Key: HBASE-3168
>                 URL: https://issues.apache.org/jira/browse/HBASE-3168
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.89.20100924
>         Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
>            Reporter: Jeff Whiting
>            Assignee: Jeff Whiting
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3168-trunk-v1.txt, HBASE-3168-trunk-v2.txt
>
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock 
> isn't too far out of skew with the rest of the cluster.  If the RS's time is 
> too far out of skew then the master would prevent it from joining and RS 
> would die and log the error. 
> Having a RS with even small differences in time can cause huge problems due 
> to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing: 
> {code}
>     HServerInfo info = new HServerInfo(serverInfo);
>     checkIsDead(info.getServerName(), "STARTUP");
>     checkAlreadySameHostPort(info);
>     recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to