[ 
https://issues.apache.org/jira/browse/HBASE-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998729#comment-12998729
 ] 

stack commented on HBASE-3558:
------------------------------

So, your ntp service was giving you bogus responses?  Thats a good one.  Its 
for sure going to mess with cluster clocks.  We shouldn't be so vulnerable to 
ntp flapping.  How did you narrow in on ntp service as culprit?  That seems 
like it would take a bit of detective work to figure that one.

Regionserver heartbeating is going away but we could have RS's publish their 
times to zk along w/ load and master could yell if any go astray.

> Warnings if RS times are out of sync
> ------------------------------------
>
>                 Key: HBASE-3558
>                 URL: https://issues.apache.org/jira/browse/HBASE-3558
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.89.20100924
>            Reporter: Sean Sechrist
>            Priority: Minor
>
> Last night we ran into a problem with the times on RSs being out of sync by 1 
> minute. The times were being reset by ~70s often because we were getting 
> different responses from pool.ntpd.org.
> This caused lost ZK sessions and problems writing to datanodes,  so all the 
> RSs kept shutting down.
> I think it would be useful to have HBaseFsck check to see if the times on the 
> region servers are out of sync. Or maybe put a warning on the master web ui 
> or something. 
> This seems related to HBASE-3168, but applies when region servers become out 
> of sync once they already joined the cluster (due to NTP issues or something 
> else).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to