GabrielBrascher commented on pull request #4978: URL: https://github.com/apache/cloudstack/pull/4978#issuecomment-933484526
@rhtyd thanks for the review. I hope that I can address all your comments here: > we should explore options to implement this without introducing a new service (my main concern is from security and upgrade point of view, a lot of people don't like non-essential services running on hypervisor) I understand that we should avoid populating new services, but I see HA as an essential part, and having it decoupled from the CloudStack agent helps with avoiding specific problems with the Java process. Additionally, this PR adds a global settings (on cluster scope) `kvm.ha.webservice.enabled`. By default, it is set to false, one can easily enable/disable it which results in CloudStack HA workflow skipping or not the checks for the KVM HA Helper. > for example, (1) what if I the admin wants to do some maintainance etc which requires stopping of the agent - in that case could your changes cause any side-effect, (2) systemd can be configured (probably already is?) to have this service always start on boot and on-crash/on-error You are right, this something to be careful about. We've configured the service in a way that it always starts on boot and if the process/job is killed for any reason it gets restarted as well. The only way of stopping it is via systemd (e.g. `systemctl stop cloudstack-hahelper.service`) > agent has a stop command answer it can tell mgmt server why it is stopping - that can be used intelligently to not cause HA led migrations (I haven't checked, probably already-is?) We did not implement such a way of telling that the agent has been "intentionally stopped". This would rely on Admins disabling it on the CloudStack side. I will need to add some information in the documentation about how to handle the cluster with this agent. > if this new service is essential, can it be secured using CA-framework generated certificates so at least the communication is validated (the simplest being server certificate was signed/created against the root CA cert) I can look into a way of adding CA certificates and validate the communications. For now, it has no such validation; however, it binds only with the node IP in the management network (which in theory is an isolated/secure network). > and a global setting/kill-switch for users who don't want/need this additional feature/service (for ex. NFS users?) and have it disabled by default Perfect, this is important indeed. We've added it via `kvm.ha.webservice.enabled`. One can set it per cluster, thus managing specifically which cluster is intended to have it enabled/disabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
