[1/4] kudu git commit: [docs] Add tip on dealing with planned TS downtime

granthenke Fri, 21 Sep 2018 06:46:31 -0700

Repository: kudu
Updated Branches:
  refs/heads/master 816bc6fd8 -> fd1ffd0fb



[docs] Add tip on dealing with planned TS downtime

Rendering available at
https://github.com/wdberkeley/kudu/blob/docfollowerunavailablesec/docs/administration.adoc.

Change-Id: I55a992a00f35945187e02c55594edc6e261a72c4
Reviewed-on: http://gerrit.cloudera.org:8080/11486
Reviewed-by: Andrew Wong <[email protected]>
Reviewed-by: Grant Henke <[email protected]>
Tested-by: Will Berkeley <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/3a033d82
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/3a033d82
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/3a033d82

Branch: refs/heads/master
Commit: 3a033d829cd6aab17995b68371e7e136c47cc9b8
Parents: 816bc6f
Author: Will Berkeley <[email protected]>
Authored: Thu Sep 20 12:23:41 2018 -0700
Committer: Will Berkeley <[email protected]>
Committed: Thu Sep 20 21:32:51 2018 +0000

----------------------------------------------------------------------
 docs/administration.adoc | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/3a033d82/docs/administration.adoc
----------------------------------------------------------------------
diff --git a/docs/administration.adoc b/docs/administration.adoc
index 74de5a0..b176f58 100644
--- a/docs/administration.adoc
+++ b/docs/administration.adoc
@@ -1120,6 +1120,43 @@ a node onto another machine.
 
 . Start all Kudu processes in the cluster.
 
+[[minimizing_cluster_disruption_during_temporary_single_ts_downtime]]
+=== Minimizing cluster disruption during temporary planned downtime of a 
single tablet server
+
+If a single tablet server is brought down temporarily in a healthy cluster, all
+tablets will remain available and clients will function as normal, after
+potential short delays due to leader elections. However, if the downtime lasts
+for more than `--follower_unavailable_considered_failed_sec` (default 300)
+seconds, the tablet replicas on the down tablet server will be replaced by new
+replicas on available tablet servers. This will cause stress on the cluster
+as tablets re-replicate and, if the downtime lasts long enough, significant
+reduction in the number of replicas on the down tablet server. This may require
+the rebalancer to fix.
+
+To work around this, increase `--follower_unavailable_considered_failed_sec` on
+all tablet servers so the amount of time before re-replication will start is
+longer than the expected downtime of the tablet server, including the time it
+takes the tablet server to restart and bootstrap its tablet replicas. To do
+this, run the following command for each tablet server:
+
+[source,bash]
+----
+$ sudo -u kudu kudu tserver set_flag <tserver_address> 
follower_unavailable_considered_failed_sec <num_seconds>
+----
+
+where `<num_seconds>` is the number of seconds that will encompass the 
downtime.
+Once the downtime is finished, reset the flag to its original value.
+
+----
+$ sudo -u kudu kudu tserver set_flag <tserver_address> 
follower_unavailable_considered_failed_sec <original_value>
+----
+
+WARNING: Be sure to reset the value of 
`--follower_unavailable_considered_failed_sec`
+to its original value.
+
+NOTE: On Kudu versions prior to 1.8, the `--force` flag must be provided in 
the above
+commands.
+
 [[rebalancer_tool]]
 === Running the tablet rebalancing tool

[1/4] kudu git commit: [docs] Add tip on dealing with planned TS downtime

Reply via email to