[kudu] 03/03: docs: add docs to orchestrate a rolling restart

granthenke Mon, 18 May 2020 14:02:13 -0700

This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a commit to branch branch-1.12.x
in repository https://gitbox.apache.org/repos/asf/kudu.git


commit 1566ae27fc6ec0143dc4b2b809e2971665bca203
Author: Andrew Wong <[email protected]>
AuthorDate: Fri May 15 16:31:07 2020 -0700

    docs: add docs to orchestrate a rolling restart
    
    Change-Id: I268928ccdf23863880349716b9e5a848a0e443bb
    Reviewed-on: http://gerrit.cloudera.org:8080/15930
    Tested-by: Kudu Jenkins
    Reviewed-by: Alexey Serbin <[email protected]>
    Reviewed-by: Grant Henke <[email protected]>
    (cherry picked from commit 161dec90a10aa96fcc1d2ad789743f6bb37e0d48)
    Reviewed-on: http://gerrit.cloudera.org:8080/15943
    Reviewed-by: Hao Hao <[email protected]>
---
 docs/administration.adoc | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/docs/administration.adoc b/docs/administration.adoc
index 48f7d71..2c12ac7 100644
--- a/docs/administration.adoc
+++ b/docs/administration.adoc
@@ -1202,9 +1202,9 @@ the new directory.
 WARNING: All of the command line steps below should be executed as the Kudu
 UNIX user, typically `kudu`.
 
-. Establish a
+. Use `ksck` to ensure the cluster is healthy, and establish a
   
<<minimizing_cluster_disruption_during_temporary_single_ts_downtime,maintenance
-  window>> and shut down the tablet server.
+  window>> to bring the tablet server offline.
 
 . Run the tool with the desired directory configuration flags. For example, if 
a
   cluster was set up with `--fs_wal_dir=/wals`, `--fs_metadata_dir=/meta`, and
@@ -1532,6 +1532,39 @@ to its original value.
 NOTE: On Kudu versions prior to 1.8, the `--force` flag must be provided in 
the above
 `set_flag` commands.
 
+[[rolling_restart]]
+=== Orchestrating a rolling restart with no downtime
+
+As of Kudu 1.12, tooling is available to restart a cluster with no downtime. To
+perform such a "rolling restart", perform the following sequence:
+
+. Restart the master(s) one-by-one. If there is only a single master, this may
+  cause brief interference with on-going workloads.
+. Starting with a single tablet server, put the tablet server into
+  
<<minimizing_cluster_disruption_during_temporary_single_ts_downtime,maintenance
+  mode>> by using the `kudu tserver state enter_maintenance` tool.
+. Start quiescing the tablet server using the `kudu tserver quiesce start`
+  tool. This will signal to Kudu to stop hosting leaders on the specified
+  tablet server and to redirect new scan requests to other tablet servers.
+. Periodically run `kudu tserver quiesce start` with the
+  `--error_if_not_fully_quiesced` option, until it returns success, indicating
+  that all leaders have been moved away from the tablet server and all on-going
+  scans have completed.
+. Restart the tablet server.
+. Periodically run `ksck` until the cluster is reported to be healthy.
+. Exit maintenance mode on the tablet server by running `kudu tserver state
+  exit_maintenance`. This will allow new tablet replicas to be placed on the
+  tablet server.
+. Repeat these steps for all tablet servers in the cluster.
+
+NOTE: If running with <<rack_awareness,rack awareness>>, the above steps can be
+performed restarting multiple tablet servers within a single rack at the same
+time. Users should use `ksck` to ensure the location assignment policy is
+enforced while going through these steps, and that no more than a single
+location is restarted at the same time. At least three locations should be
+defined in the cluster to safely restart multiple tablet service within one
+location.
+
 [[rebalancer_tool]]
 === Running the tablet rebalancing tool

[kudu] 03/03: docs: add docs to orchestrate a rolling restart

Reply via email to