On Fri, Jan 19, 2018 at 12:46 PM, Michal Skrivanek < michal.skriva...@redhat.com> wrote:
> > > On 18 Jan 2018, at 17:36, Arik Hadas <aha...@redhat.com> wrote: > > > > On Wed, Jan 17, 2018 at 9:41 PM, Milan Zamazal <mzama...@redhat.com> > wrote: > >> Dafna Ron <d...@redhat.com> writes: >> >> > We had a failure in test 006_migrations.migrate_vm >> > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-test >> er/4842/testReport/junit/%28root%29/006_migrations/migrate_vm/>. >> > >> > the migration failed with reason "VMExists" >> >> There are two migrations in 006_migrations.migrate_vm. The first one >> succeeded, but if I'm looking correctly into the logs, Engine didn't >> send Destroy to the source host after the migration had finished. Then >> the second migration gets rejected by Vdsm, because Vdsm still keeps the >> former Vm object instance in Down status. >> >> Since the test succeeds most of the time, it looks like some timing >> issue or border case. Arik, is it a known problem? If not, would you >> like to look into the logs, whether you can see what's happening? > > > Your analysis is correct. That's a nice one actually! > > The statistics monitoring cycles of both hosts host-0 and host-1 were > scheduled in a way that they are executed almost at the same time [1]. > > Now, at 6:46:34 the VM was migrated from host-1 to host-0. > At 6:46:42 the migration succeeded - we got events from both hosts, but > only processed the one from the destination so the VM switched to Up. > The next statistics monitoring cycle was triggered at 6:46:44 - again, the > report of that VM from the source host was skipped because we processed the > one from the destination. > At 6:46:59, in the next statistics monitoring cycle, it happened again - > the report of the VM from the source host was skipped. > The next migration was triggered at 6:47:05 - the engine didn't manage to > process any report from the source host, so the VM remained Down there. > > The probability of this to happen is extremely low. > > > Why wasn't the migration rerun? > Good question, because a migration to a particular host (MigrateVmToServer) was requested. In this particular case, it seems that there are only two hosts defined so changing it to MigrateVm wouldn't make any difference though. > > However, I think we can make a little tweak to the monitoring code to > avoid this: > "If we get the VM as Down on an unexpected host (that is, not the host we > expect the VM to run on), do not lock the VM" > It should be safe since we don't update anything in this scenario. > > [1] For instance: > 2018-01-15 06:46:44,905-05 ... GetAllVmStatsVDSCommand ... > VdsIdVDSCommandParametersBase:{hostId='873a4d36-55fe-4be1- > acb7-8de9c9123eb2'}) > 2018-01-15 06:46:44,932-05 ... GetAllVmStatsVDSCommand ... > VdsIdVDSCommandParametersBase:{hostId='31f09289-ec6c-42ff- > a745-e82e8ac8e6b9'}) > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > > >
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel