[ https://issues.apache.org/jira/browse/KUDU-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin resolved KUDU-3048. --------------------------------- Fix Version/s: 1.12.0 Resolution: Fixed > Add time/clock synchronization metrics > -------------------------------------- > > Key: KUDU-3048 > URL: https://issues.apache.org/jira/browse/KUDU-3048 > Project: Kudu > Issue Type: Improvement > Components: clock, master, tserver > Reporter: Alexey Serbin > Assignee: Alexey Serbin > Priority: Major > Labels: clock > Fix For: 1.12.0 > > > For better visibility, it would be great to add metrics reflecting time/clock > synchronization parameters: > * the stats on the max_error sampled while reading the underlying clock > * the stats on time intervals when the underlying clock was extrapolated > instead of using the actual readings: number of such intervals and stats on > the interval duration > * whether hybrid clock timestamps are generated using interpolated clock > readings instead of real ones > * if using the {{built-in}} time source: > ** difference between tracked true time and local wallclock > ** most recently computed true time > ** the stats on the maximum error of the computed true time > As for the rationale behind the new metrics: > * max_error shows how far the clock is from the true time, and maybe it's > time to use other set of NTP servers or instead increase the > {{\-\-max_clock_sync_error_usec}} flag value > * presence of the extrapolation intervals for the hybrid clock signals about > periods of non-availability for NTP servers, and possible action would be > re-visiting the set of NTP servers > * if hybrid timestamps are being extrapolated for some time, Kudu masters and > tablet servers might crash if the clock errors eventually goes beyond the > configured threshold: it's time to start troubleshooting the issue to avoid > possible non-availability of the cluster > * the delta between true time tracked by the built-in NTP client and the > local system clock is useful to understand how the log timestamps are related > to the HybridClock timestamps (in case of using the built-in NTP client those > might diverge) > * the stats on true time computed by the built-in NTP client give insights on > the quality of the reference NTP servers > The new metrics can be used for monitoring and alerting, allowing for > pro-active maintenance of a Kudu cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)