Github user rafaelweingartner commented on the pull request:
https://github.com/apache/cloudstack/pull/943#issuecomment-160966699
@DaanHoogland,
I checked the logs you sent me.
The VMs were marked as destroyed, but it seems that they have not been
âdestroyedâ or removed/expunged yet. I looked at the code, and the only way
that they are removed from the response of the list VMs methods is after the
expunge thread execution that fills out the âremovedâ field in the database.
I also looked at the code of the integration tests, my perl is a little
rusty, but I noticed that the code waits a few cycles (2) of the expunge delay
to execute; therefore, there is no way to guarantee that the expunge thread has
already been executed and the VM has passed the expunge delay and has been
removed.
If I recall properly, there are mainly three (3) variables in play, the
time that the VM was destroyed, the expunge delay per se and the expunge
interval (the interval of the expunge thread execution).
So, if the expunge thread runs, but the VM has been destroyed too recently
and has not passed the expunge delay, it will not be marked as destroyed. That
is what seems to have happened there. I know some people may come and say,
âthe test worked a lot of timeâ. And yes it can work, but it depends if you
are luck or not. I personally do not like tests that may present this kind of
behavior. Moreover, the expunge interval depends on the time that the MS has
been started.
I will illustrate it with an example that we have seen happening here.
Giving that our expunge interval is 24 hours, and our expunge delay is also
24 hours. Suppose the MS server was started and got up and running at some day
at 23:59 and that the first time the expunge thread runs is 00:00. If we are
unlucky and we destroy the VM at 00:01, next day (second run of the expunge
thread) when the thread runs at 00:00, the VM will not be removed and will
continue appearing, since the expunge delay that cotrols the VMs removal is 24
hours and the VM has been destroyed for 23:59 (almost there, but not yet).
Therefore, the VM will only be removed in the third execution of the expunge
thread.
Having said that, I have the following questions, what do we want with that
test? We want to test the expunge thread? Or just test If the destroyed VM is
not listed? If we want the second, why donât we force the expunging (using
expungeVirtualMachine command) instead of waiting the expunge thread?
If the idea is to let the test as it is, to avoid the problem I have just
described, we could just change a "bit" of the file test_vm_life_cycle the
multiplier, in line 632 from â4â to â6â . That change would guarantee
to wait till the third execution of the expunge thread, and avoid cases as the
one described.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---