Kukunin opened a new pull request, #13363:
URL: https://github.com/apache/cloudstack/pull/13363

   ## What's happening
   
   When a VM is stopped via the API, CloudStack moves its CPU/RAM from `used` 
to `reserved` on its `last_host_id` (see the `Stopping → Stopped + 
OperationSucceeded` branch in `CapacityManagerImpl.postStateTransitionEvent`). 
The idea is that a quick restart on the same host can reclaim its earmarked 
slot cheaply.
   
   The asymmetry: when the VM later starts on the **same** host, the 
`fromLastHost=true` branch in `allocateVmCapacity` drains that reservation. 
When it starts on a **different** host, the old reservation just sits there. It 
only clears when:
   - `capacity.skipcounting.hours` (default 1h) elapses and 
`updateCapacityForHost` recycles it, or
   - the VM is destroyed/expunged.
   
   Until then, the orphan reservation gets summed into the cluster's `used + 
reserved` aggregate by `FirstFitPlanner.removeClustersCrossingThreshold`. On a 
cluster that's already near 
`cluster.memory.allocated.capacity.disablethreshold` (default 0.85), the 
phantom can trip the threshold and block subsequent VM starts in the whole 
cluster — even though the VM in question isn't actually consuming anything on 
its old host.
   
   We hit this on a fairly full cluster where stop-then-start cycles started 
failing intermittently with `InsufficientServerCapacityException: No 
destination found`. The "ghost" capacity was the released-but-not-drained 
reservation from the last stop.
   
   ## The fix
   
   `postStateTransitionEvent` now drains the VM's reservation on its previous 
host before allocating on the target host — regardless of whether the target is 
the same host or a different one. Treating both cases identically removes the 
`fromLastHost` asymmetry.
   
   ```java
   if ((newState == State.Starting || newState == State.Migrating || event == 
Event.AgentReportMigrated) && vm.getHostId() != null) {
       if (vm.getLastHostId() != null) {
           releaseVmCapacity(vm, true, false, vm.getLastHostId());
       }
       allocateVmCapacity(vm);
   }
   ```
   
   Side effects of unifying the path:
   - The `fromLastHost=true` branch in `allocateVmCapacity` is now unreachable 
from `postStateTransitionEvent`. The only other caller 
(`VirtualMachineManagerImpl#reconfiguringOnExistingHost`) already passes 
`false`, so the parameter is removed entirely.
   - Fixes the long-standing `moveToReservered` typo (3 e's) — renamed to 
`moveToReserved` throughout the interface, the impl, and the debug logs.
   - One `logger.debug(String.format(...))` in `postStateTransitionEvent` 
switched to SLF4J `{}` placeholders to match the rest of the file.
   
   ## Tests
   
   Two new tests in `CapacityManagerImplTest` cover both transitions:
   
   - 
`testPostStateTransitionReleasesStaleReservationWhenStartingOnDifferentHost` — 
fails on `main`, passes with the fix. Verifies `releaseVmCapacity(vm, true, 
false, oldHostId)` is invoked.
   - `testPostStateTransitionReleasesReservationWhenStartingOnSameHost` — 
guards the unified contract so both paths stay in lockstep.
   
   ## Manual validation
   
   Reproduced on a small test cluster:
   1. Deploy a VM on host A, stop it (`reserved=128M` appears on A in 
`op_host_capacity`).
   2. Force a start on host B via `hostid`.
   3. Logs show the inline drain: `release mem from host: A, old reserved: 
128MB → new reserved: 0`.
   4. After the VM reaches `Running` on B, `last_host_id` updates to B and 
`updateCapacityForHost` agrees the reservation is gone.
   
   Same-host stop/start round-trip continues to behave the same way as before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to