[
https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929667#action_12929667
]
Jonathan Gray commented on HBASE-3168:
--------------------------------------
Looks pretty good, Jeff. I have a few comments. If you want, put your diff up
on review board (http://review.cloudera.org) but will give you my comments here
for now:
- The unit test does all the work of starting/stopping an entire cluster, but
then you just go behind the scenes and use ServerManager directly. You should
skip the cluster stuff and just mock up a ServerManager if that's the only
thing we need to test. (but I do appreciate a unit test!) :)
- There looks to be a few lines that are > 80 characters (like method
signatures in HMasterRegionInterface, HMaster, and elsewhere. some stuff in
checkClockSkew)
- In HMaster, just remove the comment (and the @see) and instead use
@Override... this will take care of javadoc and is best form i think
- It's not ideal to have to pull from Configuration object on each call to the
check for skew, but this is fairly rare so not a big deal.
I can fix most of this stuff on commit. Can you just address the unit test?
Let's make it as simple/fast as possible, I think it's fine to just have a
server-side unit test (ServerManager level) rather than a full cluster test.
> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
> Key: HBASE-3168
> URL: https://issues.apache.org/jira/browse/HBASE-3168
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 0.89.20100924
> Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
> Reporter: Jeff Whiting
> Assignee: Jeff Whiting
> Fix For: 0.90.0
>
> Attachments: HBASE-3168-trunk-v1.txt, HBASE-3168-trunk-v2.txt
>
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock
> isn't too far out of skew with the rest of the cluster. If the RS's time is
> too far out of skew then the master would prevent it from joining and RS
> would die and log the error.
> Having a RS with even small differences in time can cause huge problems due
> to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing:
> {code}
> HServerInfo info = new HServerInfo(serverInfo);
> checkIsDead(info.getServerName(), "STARTUP");
> checkAlreadySameHostPort(info);
> recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.