kiranchavala opened a new issue, #12699:
URL: https://github.com/apache/cloudstack/issues/12699
### problem
CKS cluster remains in Alert state if the scaling fails due to capacity
issue on the hypervisor host
### versions
ACS 4.22
### The steps to reproduce the bug
Have cloudstack environment with 2 kvm host in a cluster
1. Launch a Cks cluster with size 2 ( worker nodes)
Worker nodes deployed on kvm host 2
2. CKS cluster in running state
3. Deploy other vm's in the cloudstack environment so that capacity of kvm
host have reached
4. Scale the CKS cluster to size 3
5. Scaling of the CKS cluster fails due to capacity issue
The new worker node will be in stopped state
6. CKS cluster will be in Alert state
```
2026-02-24 11:12:14,223 DEBUG [c.c.k.c.KubernetesClusterManagerImpl]
(Kubernetes-Cluster-State-Scanner-1:[ctx-c196e036]) (logid:43979d1a) Found VM:
VM instance
{"id":16,"instanceName":"i-2-16-VM","state":"Stopped","type":"User","uuid":"47386d74-3c9f-49aa-b102-1c10537c8350"}
in the Kubernetes cluster KubernetesCluster
{"id":2,"name":"test","uuid":"e155ab23-68ca-4c3e-b8c5-7175a3f65fda"} in state:
Stopped while expected to be in state: Running. So moving the cluster to Alert
state for reconciliation
2026-02-24 11:12:14,224 DEBUG [c.c.k.c.KubernetesClusterManagerImpl]
(Kubernetes-Cluster-State-Scanner-1:[ctx-c196e036]) (logid:43979d1a) Found VM:
VM instance
{"id":9,"instanceName":"i-2-9-VM","state":"Running","type":"User","uuid":"ebf0a5a6-01b7-462a-bad6-1f61887f0f41"}
in the Kubernetes cluster KubernetesCluster
{"id":2,"name":"test","uuid":"e155ab23-68ca-4c3e-b8c5-7175a3f65fda"} in state:
Running while expected to be in state: Stopped. So moving the cluster to Alert
state for reconciliation
```
7. Cannot remove the worker node which is stopped state
Exception thrown
<img width="1623" height="528" alt="Image"
src="https://github.com/user-attachments/assets/32b8bad2-db9c-4686-ac57-3c26b9f9d378"
/>
### What to do about it?
CKS cluster should go back to running state since the scaling failed due to
insufficent capacity issue
Currently, we are checking only for resource limit during scaling operation
with this pr
https://github.com/apache/cloudstack/pull/12167
We should also check host capacity before scaling
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]