[ 
https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930393#action_12930393
 ] 

HBase Review Board commented on HBASE-3168:
-------------------------------------------

Message from: "Jonathan Gray" <[email protected]>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1193/
-----------------------------------------------------------

(Updated 2010-11-09 16:48:36.816438)


Review request for hbase and stack.


Changes
-------

Over in RS, if we get RemoteException which is instance of ClockSkew, then 
re-throw the IOE so that the HRS actually aborts.  Without this, HRS would not 
actually startup but instead of killing itself it would just loop insanely and 
keep trying to check in with master forever.

Thanks for review stack.  Committing this.


Summary
-------

This is patch from Jeff Whiting.  I then did little bits of polish and slim 
down of the unit test.

I uncovered very odd coupling of LogsCleaner being instantiated within 
ServerManager, though we don't use it there and it doesn't use SM.  So that's 
refactored out into HMaster and is started up/shut down with 
start/stopServiceThreads().

Changes from Jeff patch:
- Moved pulling maxSkew from config into constructor rather than doing it on 
each call
- Cleaned up the logging message a bit and changed from DEBUG to WARN
- HRS side, use EnvironmentEdgeManager rather than System.currentTimeMillis 
directly
- Changes test to operate directly on ServerManager. I had to do a bit of 
refactoring of ServerManager to get this to work and it's nothing something 
anyone new would have pulled the trigger on (moving stuff into another class 
instead of the weird unnecessary coupling to ServerManager).


This addresses bug HBASE-3168.
    http://issues.apache.org/jira/browse/HBASE-3168


Diffs (updated)
-----

  trunk/src/main/java/org/apache/hadoop/hbase/ClockOutOfSyncException.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java 
1033288 
  trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterRegionInterface.java 
1033288 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1033288 
  trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1033288 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1033288 
  
trunk/src/test/java/org/apache/hadoop/hbase/master/TestClockSkewDetection.java 
PRE-CREATION 

Diff: http://review.cloudera.org/r/1193/diff


Testing
-------

New added test passes.


Thanks,

Jonathan




> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
>                 Key: HBASE-3168
>                 URL: https://issues.apache.org/jira/browse/HBASE-3168
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.89.20100924
>         Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
>            Reporter: Jeff Whiting
>            Assignee: Jeff Whiting
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3168-trunk-v1.txt, HBASE-3168-trunk-v2.txt, 
> HBASE-3168-trunk-v3.txt, HBASE-3168-v4.patch, HBASE-3168-v5.patch
>
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock 
> isn't too far out of skew with the rest of the cluster.  If the RS's time is 
> too far out of skew then the master would prevent it from joining and RS 
> would die and log the error. 
> Having a RS with even small differences in time can cause huge problems due 
> to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing: 
> {code}
>     HServerInfo info = new HServerInfo(serverInfo);
>     checkIsDead(info.getServerName(), "STARTUP");
>     checkAlreadySameHostPort(info);
>     recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to