Moves supplemental HA docs to child of HA docs
Project: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/commit/3a8cd9da Tree: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/tree/3a8cd9da Diff: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/diff/3a8cd9da Branch: refs/heads/master Commit: 3a8cd9dacd91dcabf172565344e66ccde9edf01a Parents: 9159de3 Author: Martin Harris <[email protected]> Authored: Tue Jun 21 13:48:15 2016 +0100 Committer: Martin Harris <[email protected]> Committed: Thu Jun 23 10:41:51 2016 +0100 ---------------------------------------------------------------------- guide/ops/high-availability-supplemental.md | 142 ----------------- guide/ops/high-availability.md | 51 ------ .../high-availability-supplemental.md | 155 +++++++++++++++++++ guide/ops/high-availability/index.md | 53 +++++++ guide/ops/index.md | 3 +- 5 files changed, 209 insertions(+), 195 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/3a8cd9da/guide/ops/high-availability-supplemental.md ---------------------------------------------------------------------- diff --git a/guide/ops/high-availability-supplemental.md b/guide/ops/high-availability-supplemental.md deleted file mode 100644 index eb0d135..0000000 --- a/guide/ops/high-availability-supplemental.md +++ /dev/null @@ -1,142 +0,0 @@ ---- -title: High Availability (Supplemental) -layout: website-normal ---- - -This document supplements the High Availability documentation available [here](http://brooklyn.apache.org/v/latest/ops/high-availability.html) -and provides an example of how to configure a pair of Apache Brooklyn servers to run in master-standby mode with a shared NFS datastore - -### Prerequisites -- Two VMs (or physical machines) have been provisioned -- NFS or another suitable file system has been configured and is available to both VMs* -- An NFS folder has been mounted on both VMs at `/mnt/brooklyn-persistence` and both machines can write to the folder - -\* Brooklyn can be configured to use either an object store such as S3, or a shared NFS mount. The recommended option is to use an object -store as described in the [Object Store Persistence](./persistence/#object-store-persistence) documentation. For clarity, a shared NFS folder -is assumed in this example - -### Launching -To start, download and install the latest Apache Brooklyn release on both VMs following the 'OSX / Linux' section -of the [Running Apache Brooklyn](../start/running.html#install-apache-brooklyn) documentation - -On the first VM, which will be the master node, run the following to start Brooklyn in high availability mode: - -{% highlight bash %} -$ bin/brooklyn launch --highAvailability master --persist auto --persistenceDir /mnt/brooklyn-persistence -{% endhighlight %} - -Once Brooklyn has launched, on the second VM, run the following command to launch Brooklyn in standby mode: - -{% highlight bash %} -$ bin/brooklyn launch --highAvailability auto --persist auto --persistenceDir /mnt/brooklyn-persistence -{% endhighlight %} - -### Testing -You can now confirm that Brooklyn is running in high availibility mode on the master by logging into the web console at `http://<ip-address>:8081`. -Similarly you can log into the web console on the standby VM where you will see a warning that the server is not the high availability master. -To test a failover, you can simply terminate the process on the first VM and log into the web console on the second VM. Upon launch, Brooklyn will -output its PID to the file `pid.txt`; you can terminate the process by running the following command from the same directory from which you -launched Brooklyn: - -{% highlight bash %} -$ kill -9 $(cat pid.txt) -{% endhighlight %} - -It is also possiblity to check the high availability state of a running Brooklyn server using the following curl command: - -{% highlight bash %} -$ curl -u myusername:mypassword http://<ip-address>:8081/v1/server/ha/state -{% endhighlight %} - -This will return one of the following states: - -{% highlight bash %} - -"INITIALIZING" -"STANDBY" -"HOT_STANDBY" -"HOT_BACKUP" -"MASTER" -"FAILED" -"TERMINATED" - -{% endhighlight %} - -Note: The quotation characters will be included in the reply - -To obtain information about all of the nodes in the cluster, run the following command against any of the nodes in the cluster: - -{% highlight bash %} -$ curl -u myusername:mypassword http://<ip-address>:8081/v1/server/ha/states -{% endhighlight %} - -This will return a JSON document describing the Brooklyn nodes in the cluster. An example of two HA Brooklyn nodes is as follows (whitespace formatting has been -added for clarity): - -{% highlight yaml %} - -{ - ownId: "XkJeXUXE", - masterId: "yAVz0fzo", - nodes: { - yAVz0fzo: { - nodeId: "yAVz0fzo", - nodeUri: "http://<server1-ip-address>:8081/", - status: "MASTER", - localTimestamp: 1466414301065, - remoteTimestamp: 1466414301000 - }, - XkJeXUXE: { - nodeId: "XkJeXUXE", - nodeUri: "http://<server2-ip-address>:8081/", - status: "STANDBY", - localTimestamp: 1466414301066, - remoteTimestamp: 1466414301000 - } - }, - links: { } -} - -{% endhighlight %} - -The examples above show how to use `curl` to manually check the status of Brooklyn via its REST API. The same REST API calls can also be used by -automated third party monitoring tools such as Monit - -### Failover -When running as a HA standby node, each standby Brooklyn server (in this case there is only one standby) will check the shared persisted state -every 1 second to determine the state of the HA master. If no heartbeat has been recorded for thirty seconds, then an election will be performed -and one of the standby nodes will be promoted to master. At this point all requests should be directed to the new master node - -In the event that tasks - such as the provisioning of a new entity - are running when a failover occurs, the new master will display the current -state of the entity, but will not resume its provisioning or re-run any partially completed tasks. In this case it will usually be necesarry -to remove the node and reprovision it - -### Client Configuration -It is the responsibility of the client to connect to the master Brooklyn server. This can be accomplished in a variety of ways: - -* **Reverse Proxy** - - To allow the client application to automatically fail over in the event of a master server becoming unavailable, or the promotion of a new master, - a reverse proxy can be configured to route traffic depending on the response returned by `http://<ip-address>:8081/v1/server/ha/state` (see above). - If a server returns `"MASTER"`, then traffic should be routed to that server, otherwise it should not be. The client software should be configured - to connect to the reverse proxy server and no action is required by the client in the event of a failover -* **Elastic IP with manual failover** - - If the cloud provider you are using supports Elastic or Floating IPs, then the IP address should be allocated to the HA master, and the client - application configured to connect to the floating IP address. In the event of a failure of the master node, the standby node will automatically - be promoted to master, and the floating IP will need to be manually re-allocated to the new master node. No action is required by the client - in the event of a failover -* **Client-based failover** - - In this scenario, the responsibilty for determining the Brooklyn master server falls on the client application. When configuring the client - application, a list of all servers in the cluster is passed in at application startup. On first connection, the client application connects to - any of the members of the cluster to retrieve the HA states (see above). The JSON object returned is used to determine the addresses of all - members of the cluster, and also to determine which node is the HA master - - In the event of a failure of the master node, the client application should then retrieve the HA states of the cluster from any of the other cluster - members. This is the same process as when the application first connects to the cluster. The client should refresh its list of cluster memebers - and determine which node is the HA master - - It is also recommended that the client application periodically checks the status of the cluster and updates its list of addresses. This will - ensure that failover is still possible if the standby server(s) has been replaced. It also allows additional standby servers to be added at any - time http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/3a8cd9da/guide/ops/high-availability.md ---------------------------------------------------------------------- diff --git a/guide/ops/high-availability.md b/guide/ops/high-availability.md deleted file mode 100644 index 5f05cca..0000000 --- a/guide/ops/high-availability.md +++ /dev/null @@ -1,51 +0,0 @@ ---- -title: High Availability -layout: website-normal ---- - -Brooklyn will automatically run in HA mode if multiple Brooklyn instances are started -pointing at the same persistence store. One Brooklyn node (e.g. the first one started) -is elected as HA master: all *write operations* against Brooklyn entities, such as creating -an application or invoking an effector, should be directed to the master. - -Once one node is running as `MASTER`, other nodes start in either `STANDBY` or `HOT_STANDBY` mode: - -* In `STANDBY` mode, a Brooklyn instance will monitor the master and will be a candidate - to become `MASTER` should the master fail. Standby nodes do *not* attempt to rebind - until they are elected master, so the state of existing entities is not available at - the standby node. However a standby server consumes very little resource until it is - promoted. - -* In `HOT_STANDBY` mode, a Brooklyn instance will read and make available the live state of - entities. Thus a hot-standby node is available as a read-only copy. - As with the standby node, if a hot-standby node detects that the master fails, - it will be a candidate for promotion to master. - -* In `HOT_BACKUP` mode, a Brooklyn instance will read and make available the live state of - entities, as a read-only copy. However this node is not able to become master, - so it can safely be used to test compatibility across different versions. - -To explicitly specify what HA mode a node should be in, the following CLI options are available -for the parameter `--highAvailability`: - -* `disabled`: management node works in isolation; it will not cooperate with any other standby/master nodes in management plane -* `auto`: will look for other management nodes, and will allocate itself as standby or master based on other nodes' states -* `master`: will startup as master; if there is already a master then fails immediately -* `standby`: will start up as lukewarm standby; if there is not already a master then fails immediately -* `hot_standby`: will start up as hot standby; if there is not already a master then fails immediately -* `hot_backup`: will start up as hot backup; this can be done even if there is not already a master; this node will not be a master - -The REST API offers live detection and control of the HA mode, -including setting priority to control which nodes will be promoted on master failure: - -* `/server/ha/state`: Returns the HA state of a management node (GET), - or changes the state (POST) -* `/server/ha/states`: Returns the HA states and detail for all nodes in a management plane -* `/server/ha/priority`: Returns the HA node priority for MASTER failover (GET), - or sets that priority (POST) - -Note that when POSTing to a non-master server it is necessary to pass a `Brooklyn-Allow-Non-Master-Access: true` header. -For example, the following cURL command could be used to change the state of a `STANDBY` node on `localhost:8082` to `HOT_STANDBY`: - - curl -v -X POST -d mode=HOT_STANDBY -H "Brooklyn-Allow-Non-Master-Access: true" http://localhost:8082/v1/server/ha/state - http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/3a8cd9da/guide/ops/high-availability/high-availability-supplemental.md ---------------------------------------------------------------------- diff --git a/guide/ops/high-availability/high-availability-supplemental.md b/guide/ops/high-availability/high-availability-supplemental.md new file mode 100644 index 0000000..7b663fb --- /dev/null +++ b/guide/ops/high-availability/high-availability-supplemental.md @@ -0,0 +1,155 @@ +--- +title: High Availability (Supplemental) +layout: website-normal +--- + +This document supplements the [High Availability]({{ site.path.guide }}/ops/high-availability/) documentation +and provides an example of how to configure a pair of Apache Brooklyn servers to run in master-standby mode with a shared NFS datastore + +### Prerequisites +- Two VMs (or physical machines) have been provisioned +- NFS or another suitable file system has been configured and is available to both VMs* +- An NFS folder has been mounted on both VMs at `/mnt/brooklyn-persistence` and both machines can write to the folder + +\* Brooklyn can be configured to use either an object store such as S3, or a shared NFS mount. The recommended option is to use an object +store as described in the [Object Store Persistence]({{ site.path.guide }}/ops/persistence/#object-store-persistence) documentation. For simplicity, a shared NFS folder +is assumed in this example + +### Launching +To start, download and install the latest Apache Brooklyn release on both VMs following the instructions in +[Running Apache Brooklyn]({{ site.path.guide }}/start/running.html#install-apache-brooklyn) + +On the first VM, which will be the master node, run the following to start Brooklyn in high availability mode: + +{% highlight bash %} +$ bin/brooklyn launch --highAvailability master --https --persist auto --persistenceDir /mnt/brooklyn-persistence +{% endhighlight %} + +If you are using RPMs/deb to install, please see the [Running Apache Brooklyn]({{ site.path.guide }}/start/running.html#install-apache-brooklyn) +documentation for the appropriate launch commands + +Once Brooklyn has launched, on the second VM, run the following command to launch Brooklyn in standby mode: + +{% highlight bash %} +$ bin/brooklyn launch --highAvailability auto --https --persist auto --persistenceDir /mnt/brooklyn-persistence +{% endhighlight %} + +### Failover +When running as a HA standby node, each standby Brooklyn server (in this case there is only one standby) will check the shared persisted state +every one second to determine the state of the HA master. If no heartbeat has been recorded for 30 seconds, then an election will be performed +and one of the standby nodes will be promoted to master. At this point all requests should be directed to the new master node. +If the master is terminated gracefully, the secondary will be immediately promoted to mater. Otherwise, the secondary will be promoted after +heartbeats are missed for a given length of time. This defaults to 30 seconds, and is configured in brooklyn.properties using +`brooklyn.ha.heartbeatTimeout` + +In the event that tasks - such as the provisioning of a new entity - are running when a failover occurs, the new master will display the current +state of the entity, but will not resume its provisioning or re-run any partially completed tasks. In this case it may be necessary +to remove the entity and reprovision it. In the case of a failover whilst executing a task called by an effector, it may be possible to simple +call the effector again + +### Client Configuration +It is the responsibility of the client to connect to the master Brooklyn server. This can be accomplished in a variety of ways: + +* ###Reverse Proxy + + To allow the client application to automatically fail over in the event of a master server becoming unavailable, or the promotion of a new master, + a reverse proxy can be configured to route traffic depending on the response returned by `https://<ip-address>:8443/v1/server/ha/state` (see above). + If a server returns `"MASTER"`, then traffic should be routed to that server, otherwise it should not be. The client software should be configured + to connect to the reverse proxy server and no action is required by the client in the event of a failover. It can take up to 30 seconds for the + standby to be promoted, so the reverse proxy should retry for at least this period, or the failover time should be reconfigured to be shorter + +* ###Re-allocating an Elastic IP on Failover + + If the cloud provider you are using supports Elastic or Floating IPs, then the IP address should be allocated to the HA master, and the client + application configured to connect to the floating IP address. In the event of a failure of the master node, the standby node will automatically + be promoted to master, and the floating IP will need to be manually re-allocated to the new master node. No action is required by the client + in the event of a failover. It is possible to automate the re-allocation of the floating IP if the Brooklyn servers are deployed and managed + by Brooklyn using the entity `org.apache.brooklyn.entity.brooklynnode.BrooklynCluster` + +* ###Client-based failover + + In this scenario, the responsibilty for determining the Brooklyn master server falls on the client application. When configuring the client + application, a list of all servers in the cluster is passed in at application startup. On first connection, the client application connects to + any of the members of the cluster to retrieve the HA states (see above). The JSON object returned is used to determine the addresses of all + members of the cluster, and also to determine which node is the HA master + + In the event of a failure of the master node, the client application should then retrieve the HA states of the cluster from any of the other cluster + members. This is the same process as when the application first connects to the cluster. The client should refresh its list of cluster memebers + and determine which node is the HA master + + It is also recommended that the client application periodically checks the status of the cluster and updates its list of addresses. This will + ensure that failover is still possible if the standby server(s) has been replaced. It also allows additional standby servers to be added at any + time + +### Testing +You can confirm that Brooklyn is running in high availibility mode on the master by logging into the web console at `https://<ip-address>:8443`. +Similarly you can log into the web console on the standby VM where you will see a warning that the server is not the high availability master. + +To test a failover, you can simply terminate the process on the first VM and log into the web console on the second VM. Upon launch, Brooklyn will +output its PID to the file `pid.txt`; you can force an immediate (non-graceful) termination of the process by running the following command +from the same directory from which you launched Brooklyn: + +{% highlight bash %} +$ kill -9 $(cat pid.txt) +{% endhighlight %} + +It is also possiblity to check the high availability state of a running Brooklyn server using the following curl command: + +{% highlight bash %} +$ curl -k -u myusername:mypassword https://<ip-address>:8443/v1/server/ha/state +{% endhighlight %} + +This will return one of the following states: + +{% highlight bash %} + +"INITIALIZING" +"STANDBY" +"HOT_STANDBY" +"HOT_BACKUP" +"MASTER" +"FAILED" +"TERMINATED" + +{% endhighlight %} + +Note: The quotation characters will be included in the reply + +To obtain information about all of the nodes in the cluster, run the following command against any of the nodes in the cluster: + +{% highlight bash %} +$ curl -k -u myusername:mypassword https://<ip-address>:8443/v1/server/ha/states +{% endhighlight %} + +This will return a JSON document describing the Brooklyn nodes in the cluster. An example of two HA Brooklyn nodes is as follows (whitespace formatting has been +added for clarity): + +{% highlight yaml %} + +{ + ownId: "XkJeXUXE", + masterId: "yAVz0fzo", + nodes: { + yAVz0fzo: { + nodeId: "yAVz0fzo", + nodeUri: "https://<server1-ip-address>:8443/", + status: "MASTER", + localTimestamp: 1466414301065, + remoteTimestamp: 1466414301000 + }, + XkJeXUXE: { + nodeId: "XkJeXUXE", + nodeUri: "https://<server2-ip-address>:8443/", + status: "STANDBY", + localTimestamp: 1466414301066, + remoteTimestamp: 1466414301000 + } + }, + links: { } +} + +{% endhighlight %} + +The examples above show how to use `curl` to manually check the status of Brooklyn via its REST API. The same REST API calls can also be used by +automated third party monitoring tools such as Nagios + http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/3a8cd9da/guide/ops/high-availability/index.md ---------------------------------------------------------------------- diff --git a/guide/ops/high-availability/index.md b/guide/ops/high-availability/index.md new file mode 100644 index 0000000..5fc244e --- /dev/null +++ b/guide/ops/high-availability/index.md @@ -0,0 +1,53 @@ +--- +title: High Availability +layout: website-normal +children: +- high-availability-supplemental.md +--- + +Brooklyn will automatically run in HA mode if multiple Brooklyn instances are started +pointing at the same persistence store. One Brooklyn node (e.g. the first one started) +is elected as HA master: all *write operations* against Brooklyn entities, such as creating +an application or invoking an effector, should be directed to the master. + +Once one node is running as `MASTER`, other nodes start in either `STANDBY` or `HOT_STANDBY` mode: + +* In `STANDBY` mode, a Brooklyn instance will monitor the master and will be a candidate + to become `MASTER` should the master fail. Standby nodes do *not* attempt to rebind + until they are elected master, so the state of existing entities is not available at + the standby node. However a standby server consumes very little resource until it is + promoted. + +* In `HOT_STANDBY` mode, a Brooklyn instance will read and make available the live state of + entities. Thus a hot-standby node is available as a read-only copy. + As with the standby node, if a hot-standby node detects that the master fails, + it will be a candidate for promotion to master. + +* In `HOT_BACKUP` mode, a Brooklyn instance will read and make available the live state of + entities, as a read-only copy. However this node is not able to become master, + so it can safely be used to test compatibility across different versions. + +To explicitly specify what HA mode a node should be in, the following CLI options are available +for the parameter `--highAvailability`: + +* `disabled`: management node works in isolation; it will not cooperate with any other standby/master nodes in management plane +* `auto`: will look for other management nodes, and will allocate itself as standby or master based on other nodes' states +* `master`: will startup as master; if there is already a master then fails immediately +* `standby`: will start up as lukewarm standby; if there is not already a master then fails immediately +* `hot_standby`: will start up as hot standby; if there is not already a master then fails immediately +* `hot_backup`: will start up as hot backup; this can be done even if there is not already a master; this node will not be a master + +The REST API offers live detection and control of the HA mode, +including setting priority to control which nodes will be promoted on master failure: + +* `/server/ha/state`: Returns the HA state of a management node (GET), + or changes the state (POST) +* `/server/ha/states`: Returns the HA states and detail for all nodes in a management plane +* `/server/ha/priority`: Returns the HA node priority for MASTER failover (GET), + or sets that priority (POST) + +Note that when POSTing to a non-master server it is necessary to pass a `Brooklyn-Allow-Non-Master-Access: true` header. +For example, the following cURL command could be used to change the state of a `STANDBY` node on `localhost:8082` to `HOT_STANDBY`: + + curl -v -X POST -d mode=HOT_STANDBY -H "Brooklyn-Allow-Non-Master-Access: true" http://localhost:8082/v1/server/ha/state + http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/3a8cd9da/guide/ops/index.md ---------------------------------------------------------------------- diff --git a/guide/ops/index.md b/guide/ops/index.md index 0fcfd3b..6423d73 100644 --- a/guide/ops/index.md +++ b/guide/ops/index.md @@ -9,8 +9,7 @@ children: - brooklyn_properties.md - locations/ - persistence/ -- high-availability.md -- high-availability-supplemental.md +- high-availability/ - catalog/ - rest.md - logging.md
