sureshanaparti commented on a change in pull request #223: URL: https://github.com/apache/cloudstack-documentation/pull/223#discussion_r660340873
########## File path: source/adminguide/reliability.rst ########## @@ -78,25 +156,8 @@ HA features work with iSCSI or NFS primary storage. HA with local storage is not supported. -HA for Hosts ------------- - -The user can specify a virtual machine as HA-enabled. By default, all Review comment: @Spaceman1984 any particular reason to remove this text here? Is this doc moved elsewhere? ########## File path: source/adminguide/reliability.rst ########## @@ -61,6 +61,84 @@ still available but the system VMs will not be able to contact the management server. +Multiple Management Servers Support on agents +--------------------------------------------- + +In a Cloudstack environment with multiple management servers, an agent can be +configured, based on an algorithm, to which management server to connect to. +This can be useful as an internal loadbalancer or for high availability. +An administrator is responsible for setting the list of management servers and +choosing a sorting algorithm using global settings. +The management server is responsible for propagating the settings to the +connected agents. + +Examples of an agent includes, the process responsible for communication to the Review comment: do you mean by process here? ########## File path: source/adminguide/reliability.rst ########## @@ -61,6 +61,84 @@ still available but the system VMs will not be able to contact the management server. +Multiple Management Servers Support on agents +--------------------------------------------- + +In a Cloudstack environment with multiple management servers, an agent can be +configured, based on an algorithm, to which management server to connect to. +This can be useful as an internal loadbalancer or for high availability. +An administrator is responsible for setting the list of management servers and +choosing a sorting algorithm using global settings. +The management server is responsible for propagating the settings to the +connected agents. + +Examples of an agent includes, the process responsible for communication to the +management server, running inside of the Secondary Storage Virtual Machine +(SSVM), Console Proxy Virtual Machine (CPVM) or the cloudstack-agent running on +a KVM host. + +The three global settings that need to be configured are the following: + +- hosts: a comma seperated list of management server IP addresses +- indirect.agent.lb.algorithm: The algorithm for the indirect agent LB +- indirect.agent.lb.check.interval: The preferred host check interval + for the agent's background task that checks and switches to an agent's + preferred host. + +These settings can be configured from the global settings page in the UI or +using the updateConfiguration API call. + +The indirect.agent.lb.algorithm setting supports following algorithm options: + +- static: Use the list of management server IP addresses as provided. +- roundrobin: Evenly spread hosts across management servers, based on the + host's id. +- shuffle: Pseudo Randomly sort the list (this is not recommended for + production). + +Any changes to the global settings - `indirect.agent.lb.algorithm` and +`host` does not require restarting of the management server(s) and the +agents. A change in these global settings will be propagated to all connected +agents. + +The comma-separated management server list is propagated to agents in +following cases: +- An addition of an agent (including ssvm, cpvm system VMs). +- Connection or reconnection of an agent to a management server. +- After an administrator changes the 'host' and/or the +'indirect.agent.lb.algorithm' global settings. + +On the agent side, the 'host' setting is saved in its properties file as: +`host=<comma separated addresses>@<algorithm name>`. + +From the agent's perspective, the first address in the propagated list +will be considered the preferred host. A new background task can be +activated by configuring the `indirect.agent.lb.check.interval` which is +a cluster level global setting from CloudStack and adminitrators can also Review comment: ```suggestion a cluster level global setting from CloudStack and administrators can also ``` ########## File path: source/adminguide/reliability.rst ########## @@ -61,6 +61,84 @@ still available but the system VMs will not be able to contact the management server. +Multiple Management Servers Support on agents +--------------------------------------------- + +In a Cloudstack environment with multiple management servers, an agent can be +configured, based on an algorithm, to which management server to connect to. +This can be useful as an internal loadbalancer or for high availability. +An administrator is responsible for setting the list of management servers and +choosing a sorting algorithm using global settings. +The management server is responsible for propagating the settings to the +connected agents. + +Examples of an agent includes, the process responsible for communication to the +management server, running inside of the Secondary Storage Virtual Machine +(SSVM), Console Proxy Virtual Machine (CPVM) or the cloudstack-agent running on +a KVM host. + +The three global settings that need to be configured are the following: + +- hosts: a comma seperated list of management server IP addresses +- indirect.agent.lb.algorithm: The algorithm for the indirect agent LB +- indirect.agent.lb.check.interval: The preferred host check interval + for the agent's background task that checks and switches to an agent's + preferred host. + +These settings can be configured from the global settings page in the UI or +using the updateConfiguration API call. + +The indirect.agent.lb.algorithm setting supports following algorithm options: + +- static: Use the list of management server IP addresses as provided. +- roundrobin: Evenly spread hosts across management servers, based on the + host's id. +- shuffle: Pseudo Randomly sort the list (this is not recommended for + production). + +Any changes to the global settings - `indirect.agent.lb.algorithm` and +`host` does not require restarting of the management server(s) and the +agents. A change in these global settings will be propagated to all connected +agents. + +The comma-separated management server list is propagated to agents in +following cases: +- An addition of an agent (including ssvm, cpvm system VMs). +- Connection or reconnection of an agent to a management server. +- After an administrator changes the 'host' and/or the +'indirect.agent.lb.algorithm' global settings. + +On the agent side, the 'host' setting is saved in its properties file as: +`host=<comma separated addresses>@<algorithm name>`. + +From the agent's perspective, the first address in the propagated list +will be considered the preferred host. A new background task can be +activated by configuring the `indirect.agent.lb.check.interval` which is +a cluster level global setting from CloudStack and adminitrators can also +override this by configuring the 'host.lb.check.interval' in the +`agent.properties` file. + +When an agent gets a host and algorithm combination, the host specific +background check interval is also sent and is dynamically reconfigured +in the background task without need to restart agents. + +Note: The 'static' and 'roundrobin' algorithms, strictly checks for the Review comment: can you move this algorithm details to top, where they are described ? ########## File path: source/adminguide/reliability.rst ########## @@ -126,6 +187,150 @@ that you want to dedicate to HA-enabled VMs. a crash. +HA-Enabled Hosts +---------------- + +The user can specify a host as HA-enabled, In the event of a host +failure, attemps will be made to recover the failed host by first +issuing some OOBM commands. If the host recovery fails the host will be +fenced and placed into maintenance mode. To restore the host to normal +operation, manual intervention would then be required. + +Out of band management is a requirement of HA-Enabled hosts and has to be +confiured on all intended participating hosts. +(see `“Out of band management” <hosts.html#out-of-band-management>`_). + +Host-HA has granular configuration on a host/cluster/zone level. In a large +environment, some hosts from a cluster can be HA-enabled and some not, + +Host-HA uses a state machine design to manage the operations of recovering +and fencing hosts. The current status of a host is reported when quering a +specific host. + +Timely health investigations are done on HA-Enabled hosts to monitor for +any failures. Specific thersholds can be set for failed investigations, +only when it’s exceeded, will the host transition to a different state. + +Host-HA uses both health checks and activity checks to make decisions on +recovering and fencing actions. Once determined that the host is in faulty +state (health checks failed) it runs activity checks to figure out if there is +any disk activity on the VMs running on the specific host. + +HA Resource Management Service +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Review comment: resource management service is part of Host HA, I think, no need of a separate section for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@cloudstack.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org