This is an automated email from the ASF dual-hosted git repository.

bankim pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git


The following commit(s) were added to refs/heads/master by this push:
     new 336c65b  [doc] KUDU-2181 Update multi-master addition/removal/recovery 
documentation
336c65b is described below

commit 336c65bfd84a5115dc5c6be6521751d53aa0c286
Author: Bankim Bhavsar <[email protected]>
AuthorDate: Tue Apr 27 14:43:33 2021 -0700

    [doc] KUDU-2181 Update multi-master addition/removal/recovery documentation
    
    This change updates the documentation for:
    1) "Migrating to Multiple Masters" that uses the new
    `kudu master add` CLI command merged recently.
    
    2) "Removing Kudu Masters from a Multi-Master Deployment"
    that uses the `kudu master remove` CLI tool.
    
    3) "Recovering from a dead Kudu Master in a Multi-Master Deployment"
    that uses a combination of master remove and add CLI tools.
    
    This change doesn't include any version specific steps
    as this doc is meant for the latest Kudu version 1.15.0.
    The idea is to introduce an index page for documentation that
    points to version specific docs.
    
    This change also removes Cloudera Manager(CM) specific instructions
    as they could change with automation in CM.
    
    The rendered version of the doc can be viewed here:
    
https://github.com/bbhavsar/kudu/blob/bankim/r5/docs/administration.adoc#migrate_to_multi_master
    
    Change-Id: I6a1d5bc6bbf4bc3e82e7046469d2682bf016d3a8
    Reviewed-on: http://gerrit.cloudera.org:8080/17352
    Reviewed-by: Andrew Wong <[email protected]>
    Tested-by: Kudu Jenkins
---
 docs/administration.adoc | 328 ++++++++++++++---------------------------------
 1 file changed, 97 insertions(+), 231 deletions(-)

diff --git a/docs/administration.adoc b/docs/administration.adoc
index 11ed047..c48238e 100644
--- a/docs/administration.adoc
+++ b/docs/administration.adoc
@@ -526,6 +526,9 @@ WARNING: The workflow presupposes at least basic 
familiarity with Kudu configura
 using vendor-specific tools the workflow also presupposes familiarity with
 it and the vendor's instructions should be used instead as details may differ.
 
+NOTE: From Kudu version 1.15.0, a new `kudu master add` command has been added 
that
+simplifies the orchestration to migrate an existing Kudu cluster to multiple 
masters.
+
 ==== Prepare for the migration
 
 . Establish a maintenance window (one hour should be sufficient). During this 
time the Kudu cluster
@@ -534,31 +537,13 @@ it and the vendor's instructions should be used instead 
as details may differ.
 . Decide how many masters to use. The number of masters should be odd. Three 
or five node master
   configurations are recommended; they can tolerate one or two failures 
respectively.
 
-. Perform the following preparatory steps for the existing master:
-* Identify and record the directories where the master's write-ahead log (WAL) 
and data live. If
-  using Kudu system packages, their default locations are 
/var/lib/kudu/master, but they may be
-  customized via the `fs_wal_dir` and `fs_data_dirs` configuration parameters. 
The commands below
-  assume that `fs_wal_dir` is /data/kudu/master/wal and `fs_data_dirs` is 
/data/kudu/master/data.
-  Your configuration may differ. For more information on configuring these 
directories, see the
-  link:configuration.html#directory_configuration[Kudu Configuration docs].
-* Identify and record the port the master is using for RPCs. The default port 
value is 7051, but it
-  may have been customized using the `rpc_bind_addresses` configuration 
parameter.
-* Identify the master's UUID. It can be fetched using the following command:
-+
-[source,bash]
-----
-$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> 
[--fs_data_dirs=<master_data_dir>] 2>/dev/null
-----
-master_data_dir:: existing master's previously recorded data directory
-+
-[source,bash]
-Example::
-+
-----
-$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal 
--fs_data_dirs=/data/kudu/master/data 2>/dev/null
-4aab798a69e94fab8d77069edff28ce0
-----
-+
+. Perform the following preparatory steps for the existing masters:
+* If migrating from a single master to multiple masters, ensure 
`--master_addresses` is specified
+for a single master configuration as it's required to migrate to multiple 
masters. This can be
+checked using the `kudu master get_flags` command.
+If not specified, supply `--master_addresses=<hostname>:<port>` to master's 
configuration
+and restart the single master.
+
 * Optional: configure a DNS alias for the master. The alias could be a DNS 
cname (if the machine
   already has an A record in DNS), an A record (if the machine is only known 
by its IP address),
   or an alias in /etc/hosts. The alias should be an abstract representation of 
the master (e.g.
@@ -570,16 +555,16 @@ bringing the cluster down for maintenance, and as such, 
it is highly recommended
 . If you have Kudu tables that are accessed from Impala, you must update
 the master addresses in the Apache Hive Metastore (HMS) database.
 * If you set up the DNS aliases, run the following statement in `impala-shell`,
-replacing `master-1`, `master-2`, and `master-3` with your actual aliases.
+replacing `master-1` and `master-2` with your actual aliases.
 +
 [source,sql]
 ----
 ALTER TABLE table_name
 SET TBLPROPERTIES
-('kudu.master_addresses' = 'master-1,master-2,master-3');
+('kudu.master_addresses' = 'master-1,master-2');
 ----
 +
-* If you do not have DNS aliases set up, see Step #11 in the Performing
+* If you do not have DNS aliases set up, see Step #7 in the Performing
 the migration section for updating HMS.
 +
 . Perform the following preparatory steps for each new master:
@@ -594,100 +579,48 @@ the migration section for updating HMS.
 
 [[perform-the-migration]]
 ==== Perform the migration
+From version 1.15.0, a new `kudu master add` CLI command has been added that 
orchestrates migration
+to multiple masters in an existing Kudu cluster.
 
-. Stop all the Kudu processes in the entire cluster.
-
-. Format the data directory on each new master machine, and record the 
generated UUID. Use the
-  following command sequence:
-+
-[source,bash]
-----
-$ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> 
[--fs_data_dirs=<master_data_dir>]
-$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> 
[--fs_data_dirs=<master_data_dir>] 2>/dev/null
-----
-+
-master_data_dir:: new master's previously recorded data directory
-+
-[source,bash]
-Example::
-+
-----
-$ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal 
--fs_data_dirs=/data/kudu/master/data
-$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal 
--fs_data_dirs=/data/kudu/master/data 2>/dev/null
-f5624e05f40649b79a757629a69d061e
-----
-
-. If using CM, add the new Kudu master roles now, but do not start them.
-* If using DNS aliases, override the empty value of the `Master Address` 
parameter for each role
-  (including the existing master role) with that master's alias.
-* Add the port number (separated by a colon) if using a non-default RPC port 
value.
-
-. Rewrite the master's Raft configuration with the following command, executed 
on the existing
-  master machine:
-+
-[source,bash]
-----
-$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config 
--fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> 
<all_masters>
-----
-+
-master_data_dir:: existing master's previously recorded data directory
-tablet_id:: must be the string `00000000000000000000000000000000`
-all_masters:: space-separated list of masters, both new and existing. Each 
entry in the list must be
-  a string of the form `<uuid>:<hostname>:<port>`
-uuid::: master's previously recorded UUID
-hostname::: master's previously recorded hostname or alias
-port::: master's previously recorded RPC port number
-+
-[source,bash]
-Example::
-+
-----
-$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config 
--fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 
00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:master-1:7051 
f5624e05f40649b79a757629a69d061e:master-2:7051 
988d8ac6530f426cbe180be5ba52033d:master-3:7051
-----
-
-. Modify the value of the `master_addresses` configuration parameter for both 
existing master and new masters.
-  The new value must be a comma-separated list of all of the masters. Each 
entry is a string of the form `<hostname>:<port>`
-hostname:: master's previously recorded hostname or alias
-port:: master's previously recorded RPC port number
+The procedure doesn't require stopping all the Kudu processes in the entire 
cluster but once the
+migration procedure is complete, all the Kudu processes must be restarted to
+incorporate the newly added master which can be done without incurring 
downtime as mentioned in
+the steps below.
 
-. Start the existing master.
+The procedure supports adding only one master at a time. In order to add 
multiple masters follow
+the same procedure again for the next new master.
 
-. Copy the master data to each new master with the following command, executed 
on each new master
-  machine.
+. On the new master host (not on any of the existing masters), run the `kudu 
master add` command
+to add the master. Look for any success or error messages on the console or 
the new master log file.
+The command is designed to be idempotent so in case of an error after the 
issue mentioned in the
+error messages is fixed, run the same command again to make forward progress. 
After the completion
+of the procedure irrespective of whether the procedure is successful, the new 
master is shutdown.
+The example below adds `master-2` to existing Kudu cluster with `master-1`.
 +
 WARNING: If your Kudu cluster is secure, in addition to running as the Kudu 
UNIX user, you must
-  authenticate as the Kudu service user prior to running this command.
-+
-[source,bash]
-----
-$ sudo -u kudu kudu local_replica copy_from_remote 
--fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> 
<existing_master>
-----
-+
-master_data_dir:: new master's previously recorded data directory
-tablet_id:: must be the string `00000000000000000000000000000000`
-existing_master:: RPC address of the existing master and must be a string of 
the form
-`<hostname>:<port>`
-hostname::: existing master's previously recorded hostname or alias
-port::: existing master's previously recorded RPC port number
-+
-[source,bash]
-Example::
+authenticate as the Kudu service user prior to running this command.
 +
+[source, bash]
 ----
-$ sudo -u kudu kudu local_replica copy_from_remote 
--fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 
00000000000000000000000000000000 master-1:7051
+$ sudo -u kudu kudu master add master-1 master-2 
--fs_wal_dir=/data/kudu/master/wal \
+--fs_data_dirs=/data/kudu/master/data
 ----
-
-. Start all of the new masters.
-+
-WARNING: Skip the next step if using CM.
 +
-. Modify the value of the `tserver_master_addrs` configuration parameter for 
each tablet server.
-  The new value must be a comma-separated list of masters where each entry is 
a string of the form
-  `<hostname>:<port>`
+. Modify the value of the `master_addresses` configuration parameter for 
existing masters only
+as the new master is already configured with the updated `master_addresses`.
+The new value must be a comma-separated list of all of the masters.
+Each entry is a string of the form `<hostname>:<port>`
 hostname:: master's previously recorded hostname or alias
 port:: master's previously recorded RPC port number
 
-. Start all of the tablet servers.
+. Restart the existing masters one by one.
+. Start the new master.
+. Modify the value of the `tserver_master_addrs` configuration parameter for 
each
+tablet server. The new value must be a comma-separated list of masters where 
each entry is a string
+of the form `<hostname>:<port>`
+hostname:: master's previously recorded hostname or alias
+port:: master's previously recorded RPC port number
+. Restart all the tablet servers to pick up the new master configuration.
 . If you have Kudu tables that are accessed from Impala and you didn't set up
 DNS aliases, update the HMS database manually in the underlying database that
 provides the storage for HMS.
@@ -697,8 +630,8 @@ provides the storage for HMS.
 ----
 UPDATE TABLE_PARAMS
 SET PARAM_VALUE =
-  'master-1.example.com,master-2.example.com,master-3.example.com'
-WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = 'old-master';
+  'master-1.example.com,master-2.example.com'
+WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = 
'master-1.example.com';
 ----
 +
 * In `impala-shell`, run:
@@ -708,14 +641,13 @@ WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE 
= 'old-master';
 INVALIDATE METADATA;
 ----
 
-
 ==== Verify the migration was successful
 
 To verify that all masters are working properly, perform the following sanity 
checks:
 
-* Using a browser, visit each master's web UI. Look at the /masters page. All 
of the masters should
+* Using a browser, visit each master's web UI. Look at the `/masters` page. 
All the masters should
   be listed there with one master in the LEADER role and the others in the 
FOLLOWER role. The
-  contents of /masters on each master should be the same.
+  contents of `/masters` on each master should be the same.
 
 * Run a Kudu system check (ksck) on the cluster using the `kudu` command line
   tool. See <<ksck>> for more details.
@@ -727,14 +659,10 @@ important to replace the dead master; otherwise a second 
failure may lead to a l
 depending on the number of available masters. This workflow describes how to 
replace the dead
 master.
 
-Due to https://issues.apache.org/jira/browse/KUDU-1620[KUDU-1620], it is not 
possible to perform
-this workflow without also restarting the live masters. As such, the workflow 
requires a
-maintenance window, albeit a potentially brief one if the cluster was set up 
with DNS aliases.
-
-WARNING: Kudu does not yet support live Raft configuration changes for 
masters. As such, it is only
-possible to replace a master if the deployment was created with DNS aliases or 
if every node in the
-cluster is first shut down. See the <<migrate_to_multi_master,multi-master 
migration workflow>> for
-more details on deploying with DNS aliases.
+WARNING: Replacing a master created without DNS aliases requires an 
unavailability window
+when tablet servers are restarted to pick up the replacement master at 
different hostname.
+See the <<migrate_to_multi_master,multi-master migration workflow>> for more 
details on deploying
+with DNS aliases.
 
 WARNING: The workflow presupposes at least basic familiarity with Kudu 
configuration management. If
 using vendor-specific tools the workflow also presupposes familiarity with
@@ -753,131 +681,64 @@ WARNING: All of the command line steps below should be 
executed as the Kudu UNIX
 . Ensure that the dead master is well and truly dead. Take whatever steps 
needed to prevent it from
   accidentally restarting; this can be quite dangerous for the cluster 
post-recovery.
 
-. Choose one of the remaining live masters to serve as a basis for recovery. 
The rest of this
-  workflow will refer to this master as the "reference" master.
-
 . Choose an unused machine in the cluster where the new master will live. The 
master generates very
-  little load so it can be collocated with other data services or 
load-generating processes, though
+  little load, so it can be collocated with other data services or 
load-generating processes, though
   not with another Kudu master from the same configuration.
   The rest of this workflow will refer to this master as the "replacement" 
master.
 
 . Perform the following preparatory steps for the replacement master:
+* If using the same dead master as the replacement master then delete the 
master's directories.
 * Ensure Kudu is installed on the machine, either via system packages (in 
which case the `kudu` and
-  `kudu-master` packages should be installed), or via some other means.
+`kudu-master` packages should be installed), or via some other means.
 * Choose and record the directory where the master's data will live.
 
-. Perform the following preparatory steps for each live master:
-* Identify and record the directory where the master's data lives. If using 
Kudu system packages,
-  the default value is /var/lib/kudu/master, but it may be customized via the 
`fs_wal_dir` and
-  `fs_data_dirs` configuration parameters. Please note if you've set 
`fs_data_dirs` to some directories
-  other than the value of `fs_wal_dir`, it should be explicitly included in 
every command below where
-  `fs_wal_dir` is also included. For more information on configuring these 
directories, see the
-  link:configuration.html#directory_configuration[Kudu Configuration docs].
-* Identify and record the master's UUID. It can be fetched using the following 
command:
-+
-[source,bash]
-----
-$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> 
[--fs_data_dirs=<master_data_dir>] 2>/dev/null
-----
-master_data_dir:: live master's previously recorded data directory
-+
-[source,bash]
-Example::
-+
-----
-$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal 
--fs_data_dirs=/data/kudu/master/data 2>/dev/null
-80a82c4b8a9f4c819bab744927ad765c
-----
-+
-. Perform the following preparatory steps for the reference master:
-* Identify and record the directory where the master's data lives. If using 
Kudu system packages,
-  the default value is /var/lib/kudu/master, but it may be customized via the 
`fs_wal_dir` and
-  `fs_data_dirs` configuration parameters. Please note if you've set 
`fs_data_dirs` to some directories
-  other than the value of `fs_wal_dir`, it should be explicitly included in 
every command below where
-  `fs_wal_dir` is also included. For more information on configuring these 
directories, see the
-  link:configuration.html#directory_configuration[Kudu Configuration docs].
-* Identify and record the UUIDs of every master in the cluster, using the 
following command:
-+
-[source,bash]
-----
-$ sudo -u kudu kudu local_replica cmeta print_replica_uuids 
--fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> 
2>/dev/null
-----
-master_data_dir:: reference master's previously recorded data directory
-tablet_id:: must be the string `00000000000000000000000000000000`
-+
-[source,bash]
-Example::
-+
-----
-$ sudo -u kudu kudu local_replica cmeta print_replica_uuids 
--fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 
00000000000000000000000000000000 2>/dev/null
-80a82c4b8a9f4c819bab744927ad765c 2a73eeee5d47413981d9a1c637cce170 
1c3f3094256347528d02ec107466aef3
-----
-+
-. Using the two previously-recorded lists of UUIDs (one for all live masters 
and one for all
-  masters), determine and record (by process of elimination) the UUID of the 
dead master.
-
 ==== Perform the recovery
-
-. Format the data directory on the replacement master machine using the 
previously recorded
-  UUID of the dead master. Use the following command sequence:
+. Remove the dead master from the Raft configuration of the master using the 
`kudu master remove`
+command. In the example below, dead master `master-2` is being recovered.
 +
 [source,bash]
 ----
-$ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> 
[--fs_data_dirs=<master_data_dir>] --uuid=<uuid>
+$ sudo -u kudu kudu master remove master-1,master-2 master-2
 ----
 +
-master_data_dir:: replacement master's previously recorded data directory
-uuid:: dead master's previously recorded UUID
-+
-[source,bash]
-Example::
+. On the replacement master host, add the replacement master to the cluster 
using
+`kudu master add` command. Look for any success or error messages on the 
console or the replacement
+master log file. The command is designed to be idempotent so in case of an 
error after the issue
+mentioned in the error messages is fixed, run the same command again to make 
forward progress.
+After the completion of the procedure irrespective of whether the procedure is 
successful,
+the replacement master is shutdown. In the example below, replacement master 
`master-2` is used.
+In case DNS alias is not being used, use the hostname of the replacement 
master.
 +
+[source, bash]
 ----
-$ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal 
--fs_data_dirs=/data/kudu/master/data --uuid=80a82c4b8a9f4c819bab744927ad765c
+$ sudo -u kudu kudu master add master-1 master-2 
--fs_wal_dir=/data/kudu/master/wal \
+--fs_data_dirs=/data/kudu/master/data
 ----
 +
-. Copy the master data to the replacement master with the following command:
-+
-WARNING: If your Kudu cluster is secure, in addition to running as the Kudu 
UNIX user, you must
-  authenticate as the Kudu service user prior to running this command.
-+
-[source,bash]
-----
-$ sudo -u kudu kudu local_replica copy_from_remote 
--fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> 
<reference_master>
-----
-+
-master_data_dir:: replacement master's previously recorded data directory
-tablet_id:: must be the string `00000000000000000000000000000000`
-reference_master:: RPC address of the reference master and must be a string of 
the form
-`<hostname>:<port>`
-hostname::: reference master's previously recorded hostname or alias
-port::: reference master's previously recorded RPC port number
-+
-[source,bash]
-Example::
-+
-----
-$ sudo -u kudu kudu local_replica copy_from_remote 
--fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 
00000000000000000000000000000000 master-2:7051
-----
-+
-. If using CM, add the replacement Kudu master role now, but do not start it.
-* Override the empty value of the `Master Address` parameter for the new role 
with the replacement
-  master's alias.
-* Add the port number (separated by a colon) if using a non-default RPC port 
value.
 
 . If the cluster was set up with DNS aliases, reconfigure the DNS alias for 
the dead master to point
   at the replacement master.
 
 . If the cluster was set up without DNS aliases, perform the following steps:
-* Stop the remaining live masters.
-* Rewrite the Raft configurations on these masters to include the replacement 
master. See Step 4 of
-  <<perform-the-migration, Perform the Migration>> for more details.
+.. Modify the value of the `master_addresses` configuration parameter for each 
live master
+removing the dead master and substituting it with the replacement master.
+The new value must be a comma-separated list of masters where each entry is a 
string of the form
+`<hostname>:<port>`
+hostname:: master's previously recorded hostname or alias
+port:: master's previously recorded RPC port number
+.. Restart the remaining live masters.
 
 . Start the replacement master.
 
-. Restart the remaining masters in the new multi-master deployment. While the 
masters are shut down,
-  there will be an availability outage, but it should last only as long as it 
takes for the masters
-  to come back up.
+. If the cluster was set up without DNS aliases, follow the steps below for 
tablet servers:
+.. Modify the value of the `tserver_master_addrs` configuration parameter for 
each tablet server
+removing the dead master and substituting it with the replacement master.
+The new value must be a comma-separated list of masters where each entry is a 
string of the form
+`<hostname>:<port>`
+hostname:: master's previously recorded hostname or alias
+port:: master's previously recorded RPC port number
+
+.. Restart all the tablet servers.
 
 Congratulations, the dead master has been replaced! To verify that all masters 
are working properly,
 consider performing the following sanity checks:
@@ -910,28 +771,33 @@ will be unavailable.
 `/masters` page of any master's web UI. This master must not be removed during 
this process; its
 removal may result in severe data loss.
 
-. Stop all the Kudu processes in the entire cluster.
-
-. If using CM, remove the unwanted Kudu master.
+. Stop the unwanted Kudu master processes.
 
 ==== Perform the removal
 
-. Rewrite the Raft configuration on the remaining masters to include only the 
remaining masters. See
-Step 4 of <<perform-the-migration,Perform the Migration>> for more details.
+. Perform the Raft configuration change. Run the `kudu master remove` tool.
+Only a single master can be removed at a time. If multiple masters need to be 
removed, run the
+tool multiple times. In the example below, `master-2` is being removed from a 
Kudu cluster with two
+masters `master-1,master-2`.
++
+[source,bash]
+----
+$ sudo -u kudu kudu master remove master-1,master-2 master-2
+----
++
 
 . Remove the data directories and WAL directory on the unwanted masters. This 
is a precaution to
 ensure that they cannot start up again and interfere with the new multi-master 
deployment.
 
 . Modify the value of the `master_addresses` configuration parameter for the 
masters of the new
-multi-master deployment. If migrating to a single-master deployment, the 
`master_addresses` flag
-should be omitted entirely.
+multi-master deployment.
 
-. Start all of the masters that were not removed.
+. Restart all the masters that were not removed.
 
 . Modify the value of the `tserver_master_addrs` configuration parameter for 
the tablet servers to
 remove any unwanted masters.
 
-. Start all of the tablet servers.
+. Restart all the tablet servers.
 
 ==== Verify the migration was successful
 

Reply via email to