Arik Hadas has posted comments on this change.
Change subject: core: add job that runs HA VMs which failed to run
......................................................................
Patch Set 2:
(1 comment)
Allon, that was my intention in the beginning - I used the isWait property of
locks to wait for the VM lock to be released. The problem with this approach is
that on mass invocations of run commands we might block all the threads in the
thread pool.
Think of the following scenario: we're in a large scale system where one host
is running few hundred HA VMs and we're doing live migration of disks that is
used by all the VMs. in the middle of the live storage migration the host
crashes, so we're trying to automatically start all those VMs, each thread that
run VM is going to be blocked until the live storage migration will end - and
we might end up with no threads left in the pool.
So I accepted a suggestion to use periodic job over block threads. note that I
have few optimizations in mind to do next (like trigger the job when live
snapshot ends for example).
....................................................
File
backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/AutoStartVmsRunner.java
Line 50: if (!result.getSucceeded()) {
Line 51: final AuditLogableBase event = new AuditLogableBase();
Line 52: event.setVmId(vmId);
Line 53: AuditLogDirector.log(event,
AuditLogType.HA_VM_RESTART_FAILED);
Line 54: // should insert to autoStartVmsToRun again?
note that the following scenario is handled well without inserting the VM to
the queue in this point: we try to restart HA VM and it fails because the VM is
locked so the VM is added to the queue and we try to run it in the next
iteration, but the lock is still not released, so we cannot acquire it again,
RunVmCommand will add it to the queue using addVmToRun method again, we'll try
to run it on the next iteration and so on until the lock is released.
I was considering whether to add the VM to the queue in this point, since the
run command might fail from different reason which is not related to locks. I
guess that in the general we want to add the VM to the queue everytime the run
command fails for HA VM, but if there are cases in which the run command will
keep failing for sure (let's say the VM was edited and it now has settings it
cannot start with) - we probably don't want to add the VM to the queue again in
those cases. so it's kind of TODO for me to check, but it's not affecting the
solution for the reported bug.
Line 55: }
Line 56: }
--
To view, visit http://gerrit.ovirt.org/18815
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d563d05efc6dae40f0de8e7f4b7e47aa84bd787
Gerrit-PatchSet: 2
Gerrit-Project: ovirt-engine
Gerrit-Branch: master
Gerrit-Owner: Arik Hadas <[email protected]>
Gerrit-Reviewer: Allon Mureinik <[email protected]>
Gerrit-Reviewer: Arik Hadas <[email protected]>
Gerrit-Reviewer: Barak Azulay <[email protected]>
Gerrit-Reviewer: Michal Skrivanek <[email protected]>
Gerrit-Reviewer: Omer Frenkel <[email protected]>
Gerrit-Reviewer: Yair Zaslavsky <[email protected]>
Gerrit-Reviewer: oVirt Jenkins CI Server
Gerrit-HasComments: Yes
_______________________________________________
Engine-patches mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/engine-patches