[kudu] 04/04: [doc] update on NTP clock synchronization

alexey Mon, 03 Feb 2020 09:56:33 -0800

This is an automated email from the ASF dual-hosted git repository.

alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git


commit 66515e7a3e2dce54aa5e9e31cd1dc902207f8a16
Author: Alexey Serbin <[email protected]>
AuthorDate: Thu Jan 30 10:52:32 2020 -0800

    [doc] update on NTP clock synchronization
    
    Updated NTP configuration best practices and clock synchronization
    troubleshooting tips.
    
    Change-Id: Ib3b1485b6df846a3286f52003684386e59e972ac
    Reviewed-on: http://gerrit.cloudera.org:8080/15141
    Tested-by: Kudu Jenkins
    Reviewed-by: Adar Dembo <[email protected]>
---
 docs/troubleshooting.adoc | 286 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 198 insertions(+), 88 deletions(-)

diff --git a/docs/troubleshooting.adoc b/docs/troubleshooting.adoc
index b3f3331..cbb1bad 100644
--- a/docs/troubleshooting.adoc
+++ b/docs/troubleshooting.adoc
@@ -106,19 +106,24 @@ link:administration.html#change_dir_config[Changing 
Directory Configurations] do
 [[ntp]]
 === NTP Clock Synchronization
 
-For the master and tablet server daemons, the server's clock must be 
synchronized using NTP.
-In addition, the *maximum clock error* (not to be mistaken with the estimated 
error)
-be below a configurable threshold. The default value is 10 seconds, but it can 
be set with the flag
-`--max_clock_sync_error_usec`.
+The local clock of the machine where Kudu master or tablet server is running
+must be synchronized using the Network Time Protocol (NTP) if using the 
`system`
+time source. The time source is controlled by the `--time_source` flag and
+by default is set to `system`.
 
-If NTP is not installed, or if the clock is reported as unsynchronized, Kudu 
will not
-start, and will emit a message such as:
+Kudu requires the *maximum clock error* (not to be mistaken with the estimated
+error) of the NTP-synchronized clock be below a configurable threshold.
+The default threshold value is 10 seconds and it can be customized using the
+`--max_clock_sync_error_usec` flag.
+
+When running with the `system` time source, Kudu will not start and will emit
+a message such as below if the local clock is reported unsynchronized:
 
 ----
 F0924 20:24:36.336809 14550 hybrid_clock.cc:191 Couldn't get the current time: 
Clock unsynchronized. Status: Service unavailable: Error reading clock. Clock 
considered unsynchronized.
 ----
 
-If NTP is installed and synchronized, but the maximum clock error is too high,
+If the machine's clock is synchronized, but the maximum clock error is too 
high,
 the user will see a message such as:
 
 ----
@@ -131,10 +136,13 @@ or
 Sep 17, 8:32:31.135 PM FATAL tablet_server_main.cc:38 Check failed: _s.ok() 
Bad status: Service unavailable: Cannot initialize clock: Cannot initialize 
HybridClock. Clock synchronized but error was too high (11711000 us).
 ----
 
-==== Installing NTP
+==== Installing NTP-related Packages
 
+Kudu has been well tested to work on machines whose clock is synchronized with
+`ntpd`, the NTP server from ubiquitous NTP suite.
 
-To install NTP, use the appropriate command for your operating system:
+To install `ntpd` and other NTP-related utilities, use the appropriate command
+for your operating system:
 [cols="1,1", options="header"]
 |===
 | OS | Command
@@ -142,18 +150,68 @@ To install NTP, use the appropriate command for your 
operating system:
 | RHEL/CentOS | `sudo yum install ntp`
 |===
 
-If NTP is installed but not running, start it using one of these commands:
+If `ntpd` is installed but not running, start it using one of these commands
+(don't forget to run `ntpdate` first):
 [cols="1,1", options="header"]
 |===
 | OS | Command
 | Debian/Ubuntu | `sudo service ntp restart`
-| RHEL/CentOS | `sudo /etc/init.d/ntpd restart`
+| RHEL/CentOS | `sudo service ntpd restart`
 |===
 
-====  Monitoring NTP Status
-
-When NTP is installed, you can monitor the synchronization status by running
-`ntptime`. For example, a healthy system may report:
+Make sure `ntpdate` is in the list of services running when the machine starts:
+`ntpdate` should be run prior starting `ntpd` to avoid long synchronization
+delay of the machine's local clock with the true time. The smaller the offset
+between local machine's clock and the true time, the faster the NTP server can
+synchronize the clock.
+
+When talking about the 'synchronization' with true time using NTP, we are
+referring to a couple of things:
+- the synchronization status of the NTP server which drives the local clock
+  of the machine
+- the synchronization status of the local machine's clock itself as reported
+  by the kernel's NTP discipline
+
+The former can be retrieved using the `ntpstat`, `ntpq`, and `ntpdc` utilities
+(they are included in the `ntp` package). The latter can be retrieved using the
+`ntptime` utility (the `ntptime` utility is also a part of the `ntp` package).
+For more information, see the manual pages of the mentioned utilities and
+the paragraph below.
+
+Sometimes it takes too long to synchronize the machine's local clock with the
+true time even if the `ntpstat` utility reports that the NTP daemon is
+synchronized with one of the reference NTP servers. This manifests as the
+following: the utilities which report on the synchronization status of the NTP
+daemon claim that all is well, but `ntptime` claims that the status of the
+local clock is unsynchronized and Kudu tablet servers and masters refuse to
+start, outputting an error like the one mentioned above. This situation often
+happens if the `ntpd` is run with the `-x` option. According to the manual
+page of `ntpd`, the `-x` flag configures the NTP server to only slew the clock.
+Without `-x`, the NTP server would do a step adjustment instead:
+
+----
+  -x     Normally, the time is slewed if the offset is less than the
+         step threshold, which is 128 ms by default, and stepped if
+         above the threshold. This option sets the threshold to 600 s,
+         which is well within the accuracy window to set the clock manually.
+         Note: Since the slew rate of typical Unix kernels is limited to
+         0.5 ms/s, each second of      adjustment requires an amortization
+         interval of 2000 s. Thus, an adjustment as much as 600 s
+         will take almost 14 days to complete.
+----
+
+In such cases, removing the `-x` option will help synchronize the local clock
+faster.
+
+More information on best practices and examples of practical resolution of
+various NTP synchronization issues can be found found at
+link:https://www.redhat.com/en/blog/avoiding-clock-drift-vms[clock-drift]
+
+====  Monitoring Clock Synchronization Status
+
+When the `ntp` package is installed, you can monitor the synchronization status
+of the machine's clock by running `ntptime`. For example, a system
+with a local clock that is synchronized may report:
 
 ----
 ntp_gettime() returns code 0 (OK)
@@ -167,14 +225,15 @@ ntp_adjtime() returns code 0 (OK)
   time constant 10, precision 0.001 us, tolerance 500 ppm,
 ----
 
-In particular, note the following most important pieces of output:
+Note the following most important pieces of output:
 
-- `maximum error 22455 us`: this value is well under the 10-second maximum 
error required
-  by Kudu.
-- `status 0x2001 (PLL,NANO)`: this indicates a healthy synchronization status.
+- `maximum error 22455 us`: this value is well under the 10-second maximum
+  error required by Kudu.
+- `status 0x2001 (PLL,NANO)`: this indicates the local clock is synchronized
+  with the true time up to the maximum error above
 
-In contrast, a system without NTP properly configured and running will output
-something like the following:
+In contrast, a system with unsynchronized local clock would report something
+like the following:
 
 ----
 ntp_gettime() returns code 5 (ERROR)
@@ -188,11 +247,32 @@ ntp_adjtime() returns code 5 (ERROR)
   time constant 10, precision 1.000 us, tolerance 500 ppm,
 ----
 
-Note the `UNSYNC` status and the 16-second maximum error.
+The `UNSYNC` status means the local clock is not synchronized with the
+true time. Because of that, the maximum reported error doesn't convey any
+meaningful estimation of the actual error.
+
+The `ntpstat` utility reports a summary on the synchronization status of
+the NTP daemon itself. For example, a system which have `ntpd` running and
+synchronized with one of its reference servers may report:
+
+----
+$ ntpstat
+synchronised to NTP server (172.18.7.3) at stratum 4
+   time correct to within 160 ms
+   polling server every 1024 s
+----
+
+Keep in mind that the synchronization status of the NTP daemon itself doesn't
+reflect the synchronization status of the local clock. The way NTP daemon
+drives the local clock is subject to many constraints, and it may take the NTP
+daemon some time to synchronize the local clock after it itself has latched
+to one of the reference servers.
 
-If more detailed information is needed, the `ntpq` or `ntpdc` tools
-can be used to dump further information about which network time servers
-are currently acting as sources:
+If more detailed information is needed on the synchronization status of the
+NTP server (but not the synchronization status of the local clock), the `ntpq`
+or `ntpdc` tools can be used to get detailed information about what NTP server
+is currently acting as the source of the true time and which are considered
+as candidates (either viable or not):
 
 ----
 $ ntpq -nc lpeers
@@ -205,14 +285,7 @@ $ ntpq -nc lpeers
 -69.195.159.158  128.138.140.44   2 u    9   64    1   53.885   -0.016   0.013
 *216.218.254.202 .CDMA.           1 u    6   64    1    1.475   -0.400   0.012
 +129.250.35.250  249.224.99.213   2 u    7   64    1    1.342   -0.640   0.018
- 45.76.244.193   216.239.35.4     2 u    6   64    1   17.380   -0.754   0.051
- 69.89.207.199   212.215.1.157    2 u    5   64    1   57.796   -3.411   0.059
- 171.66.97.126   .GPSs.           1 u    4   64    1    1.024   -0.374   0.018
- 66.228.42.59    211.172.242.174  3 u    3   64    1   72.409    0.895   0.964
- 91.189.89.198   17.253.34.125    2 u    2   64    1  135.195   -0.329   0.171
- 162.210.111.4   216.218.254.202  2 u    1   64    1   28.570    0.693   0.306
- 199.102.46.80   .GPS.            1 u    2   64    1   55.652   -0.039   0.019
- 91.189.89.199   17.253.34.125    2 u    1   64    1  135.265   -0.413   0.037
+
 $ ntpq -nc opeers
      remote           local      st t when poll reach   delay   offset    disp
 ==============================================================================
@@ -223,30 +296,31 @@ $ ntpq -nc opeers
 -69.195.159.158  10.17.100.238    2 u   13   64    1   53.885   -0.016 187.561
 *216.218.254.202 10.17.100.238    1 u   10   64    1    1.475   -0.400 187.543
 +129.250.35.250  10.17.100.238    2 u   11   64    1    1.342   -0.640 187.588
- 45.76.244.193   10.17.100.238    2 u   10   64    1   17.380   -0.754 187.596
- 69.89.207.199   10.17.100.238    2 u    9   64    1   57.796   -3.411 187.541
- 171.66.97.126   10.17.100.238    1 u    8   64    1    1.024   -0.374 187.578
- 66.228.42.59    10.17.100.238    3 u    7   64    1   72.409    0.895 187.589
- 91.189.89.198   10.17.100.238    2 u    6   64    1  135.195   -0.329 187.584
- 162.210.111.4   10.17.100.238    2 u    5   64    1   28.570    0.693 187.606
- 199.102.46.80   10.17.100.238    1 u    4   64    1   55.652   -0.039 187.587
- 91.189.89.199   10.17.100.238    2 u    3   64    1  135.265   -0.413 187.621
 ----
 
 TIP: Both `lpeers` and `opeers` may be helpful as `lpeers` lists refid and
 jitter, while `opeers` lists clock dispersion.
 
+==== Using `chrony` for Time Synchronization
 
-[NOTE]
-====
-.Using `chrony` for time synchronization
+Some operating systems offer `chronyd` as an alternative to `ntpd` for network
+time synchronization (the OS package is called `chrony` and contains both the
+NTP server `chronyd` and the `chronyc` utility).
 
-Some operating systems offer `chrony` as an alternative to `ntpd` for network 
time
-synchronization. Kudu has been tested most thoroughly using `ntpd` and use of
-`chrony` is considered experimental.
+If using `chronyd` for time synchronization at Kudu nodes, the `rtcsync` option
+must be enabled in `chrony.conf`. Without `rtcsync`, the local machine's clock
+will always be reported as unsynchronized and Kudu masters and tablet servers
+will not be able to start. The following
+link:https://github.com/mlichvar/chrony/blob/994409a03697b8df68115342dc8d1e7ceeeb40bd/sys_timex.c#L162-L166[code]
+explains the observed behavior of `chronyd` when setting the synchronization
+status of the clock on Linux.
 
-In order to use `chrony` for synchronization, `chrony.conf` must be configured
-with the `rtcsync` option.
+[NOTE]
+====
+Kudu has been tested most thoroughly using `ntpd` and using `chronyd` is
+viable as well, but it's still considered experimental at this time. Check out
+link:https://issues.apache.org/jira/browse/KUDU-2573[KUDU-2573] for status
+updates and more information on this topic.
 ====
 
 ==== NTP Configuration Best Practices
@@ -254,56 +328,97 @@ with the `rtcsync` option.
 In order to provide stable time synchronization with low maximum error, follow
 these best NTP configuration best practices.
 
-*Always configure at least four time sources for NTP.* In addition to providing
-redundancy in case one or more time sources becomes unavailable, The NTP 
protocol is
-designed to increase its accuracy with a diversity of sources. Even if your 
organization
-provides one or more local time servers, configuring additional remote servers 
is highly
-recommended for a robust setup.
+*Run ntpdate prior to running NTP server.* If the offset of the local
+clock is too far from the true time, it can take a long time before the NTP
+server synchronizes the local clock, even if it's allowed to perform step
+adjustments.
+
+*In certain public cloud environments, use the highly-available NTP server
+accessible via link-local IP address or other dedicated NTP server provided
+as a service.* If your cluster is running in a public cloud environment,
+consult the cloud provider's documentation for the recommended NTP setup.
+Both AWS and GCE clouds offer dedicated highly available NTP servers accessible
+from within a cloud instance via link-local IP address.
+
+*Unless using highly-available NTP reference server accessible via link-local
+address, always configure at least four time sources for NTP server at the
+local machine.* In addition to providing redundancy in case one of time sources
+becomes unavailable, this might make the configuration more robust since the
+NTP is designed to increase its accuracy with a diversity of sources in 
networks
+with higher round-trip times and jitter.
+
+*Use the `iburst` option for faster synchronization at startup*. The `iburst`
+option instructs `ntpd` to send an initial "burst" of time queries at startup.
+This results in a faster synchronization of the `ntpd` with its reference
+servers upon startup.
+
+*If the maximum clock error goes beyond the default threshold set by Kudu
+(10 seconds), consider setting lower value for the `maxpoll` option for every
+NTP server in `ntp.conf`*. For example, consider setting the `maxpoll` to 7
+which will cause the NTP daemon to make requests to the corresponding NTP
+server at least every 128 seconds. The default maximum poll interval is 10
+(1024 seconds).
+
+[NOTE]
+====
+If using custom `maxpoll` interval, don't set `maxpoll` too low (e.g., lower
+than 6) to avoid flooding NTP servers, especially the public ones. Otherwise
+they may blacklist the client (i.e. the `ntpd` daemon at your machine) and 
cease
+providing NTP service at all. If in doubt, consult the `ntp.conf` manual page.
+====
 
-*Pick servers in your server's local geography.* For example, if your servers 
are located
-in Europe, pick servers from the European NTP pool. If your servers are 
running in a public
-cloud environment, consult the cloud provider's documentation for a 
recommended NTP setup.
-Many cloud providers offer highly accurate clock synchronization as a service.
+A few examples of `ntpd` configuration files:
 
-*Use the `iburst` option for faster synchronization at startup*. The `iburst` 
option
-instructs `ntpd` to send an initial "burst" of time queries at startup. This 
typically
-results in a faster time synchronization when a machine restarts.
+----
+# Use my organization's internal NTP server (server in a local network).
+server ntp1.myorg.internal iburst maxpoll 7
+# Add servers from the NTP public pool for redundancy and robustness.
+server 0.pool.ntp.org iburst maxpoll 8
+server 1.pool.ntp.org iburst maxpoll 8
+server 2.pool.ntp.org iburst maxpoll 8
+server 3.pool.ntp.org iburst maxpoll 8
+----
 
-An example NTP server list may appear as follows:
+----
+# AWS case: use dedicated NTP server available via link-local IP address.
+server 169.254.169.123 iburst
+----
 
 ----
-# Use my organization's internal NTP servers.
-server ntp1.myorg.internal iburst
-server ntp2.myorg.internal iburst
-# Provide several public pool servers for
-# redundancy and robustness.
-server 0.pool.ntp.org iburst
-server 1.pool.ntp.org iburst
-server 2.pool.ntp.org iburst
-server 3.pool.ntp.org iburst
+# GCE case: use dedicated NTP server available from within cloud instance.
+server metadata.google.internal iburst
 ----
 
-TIP: After configuring NTP, use the `ntpq` tool described above to verify that 
`ntpd` was
-able to connect to a variety of peers. If no public peers appear, it is 
possiblbe that
-the NTP protocol is being blocked by a firewall or other network connectivity 
issue.
+TIP: After configuring `ntpd`, first run the `ntpdate` tool with the same set
+of NTP servers (it's assumed that `ntpd` is not running when the `ntpdate` tool
+is run). Make sure the tool reports success: check its exit status and output.
+In case of issues connecting to the NTP servers, make sure NTP traffic is not
+being blocked by a firewall (NTP generates UDP traffic on port 123 by default)
+or other network connectivity issue. Then start the `ntpd` daemon and use the
+`ntpq` tool described above to verify that the NTP daemon is able to connect
+to its reference servers.
 
 ==== Troubleshooting NTP Stability Problems
 
-As of Kudu 1.6.0, Kudu daemons are able to continue to operate during a brief 
loss of
-NTP synchronization. If NTP synchronization is lost for several hours, 
however, daemons
-may crash. If a daemon crashes due to NTP synchronization issues, consult the 
`ERROR` log
-for a dump of related information which may help to diagnose the issue.
+As of Kudu 1.6.0, Kudu daemons are able to continue to operate during a brief
+loss of clock synchronization. If clock synchronization is lost for several
+hours, daemons may crash. If a daemon crashes due to clock synchronization
+issues, consult the `ERROR` log for a dump of related information which may
+help to diagnose the issue.
 
 TIP: Kudu 1.5.0 and earlier versions were less resilient to brief NTP outages. 
In
 addition, they contained a 
link:https://issues.apache.org/jira/browse/KUDU-2209[bug]
 which could cause Kudu to incorrectly measure the maximum error, resulting in
 crashes. If you experience crashes related to clock synchronization on these
-earlier versions of Kudu and it appears that the system's NTP configuration is 
correct,
-consider upgrading to Kudu 1.6.0 or later.
+earlier versions of Kudu and it appears that the system's NTP configuration
+is correct, consider upgrading to Kudu 1.6.0 or later.
 
-TIP: NTP requires a network connection and may take a few minutes to 
synchronize the clock
-at startup. In some cases a spotty network connection may make NTP report the 
clock as unsynchronized.
-A common, though temporary, workaround for this is to restart NTP with one of 
the commands above.
+TIP: If using other than link-local NTP server, it may take some time for 
`ntpd`
+to synchronize with one of its reference servers in case of network 
connectivity
+issues. In case of a spotty network between the machine and the reference NTP
+servers, `ntpd` may become unsynchronized with its reference NTP servers. If
+that happens, consider finding other set of reference NTP servers: the best
+bet is to use NTP servers in the local network or *.pool.ntp.org servers.
 
 [[disk_space_usage]]
 == Disk Space Usage
@@ -315,7 +430,6 @@ it uses. This means that some tools may inaccurately report 
the disk space
 used by Kudu. For example, the size listed by `ls -l` does not accurately
 reflect the disk space used by Kudu data files:
 
-[source,bash]
 ----
 $ ls -lh /data/kudu/tserver/data
 total 117M
@@ -340,7 +454,6 @@ file's disk space usage.
 
 The `du` and `df` utilities report the actual disk space usage by default.
 
-[source,bash]
 ----
 $ du -h /data/kudu/tserver/data
 118M   /data/kudu/tserver/data
@@ -348,7 +461,6 @@ $ du -h /data/kudu/tserver/data
 
 The apparent size can be shown with the `--apparent-size` flag to `du`.
 
-[source,bash]
 ----
 $ du -h --apparent-size /data/kudu/tserver/data
 1.7G  /data/kudu/tserver/data
@@ -374,7 +486,6 @@ It is also possible to force Kudu to create a minidump 
without killing the
 process by sending a `USR1` signal to the `kudu-tserver` or `kudu-master`
 process. For example:
 
-[source,bash]
 ----
 sudo pkill -USR1 kudu-tserver
 ----
@@ -571,7 +682,6 @@ Kudu provides a set of useful metrics for evaluating the 
performance of the
 block cache, which can be found on the `/metrics` endpoint of the web UI. An
 example set:
 
-[source,json]
 ----
 {
   "name": "block_cache_inserts",

[kudu] 04/04: [doc] update on NTP clock synchronization

Reply via email to