On Thu, Nov 08, 2012 at 11:00:40 +0100, Michal Privoznik wrote:
> On 07.11.2012 23:23, Jiri Denemark wrote:
> >
> > AbortJob API was never a synchronous one and I don't think we need to change
> > it. Not to mention that this code unlocks and locks vm without holding an
> > extra reference to it. And even if it did so, it cannot guarantee that the
> > job
> > it's checking in the loop is the same one it tried to cancel. It's quite
> > unlikely but the original job could have finished and another one could have
> > been started while this code was sleeping.
>
> However, AbortJob is documented as synchronous. If current implementation
> doesn't
> follow docs it is a bug. On the other hand, I don't recall anybody screaming
> about
> it so far. But that means nothing, right?
IIRC the implementation was never synchronous in which case I think we may
just fix the documentation to match reality :-)
> >> @@ -1172,6 +1172,12 @@ qemuMigrationUpdateJobStatus(struct qemud_driver
> >> *driver,
> >> }
> >> priv->job.info.timeElapsed -= priv->job.start;
> >>
> >> + if (qemuDomainObjAbortAsyncJobRequested(vm)) {
> >> + VIR_DEBUG("Migration abort requested. Translating "
> >> + "status to MIGRATION_STATUS_CANCELLED");
> >> + status = QEMU_MONITOR_MIGRATION_STATUS_CANCELLED;
> >> + }
> >> +
> >
> > This check seems to be to late and that complicates the code later. If we
> > keep
> > qemuDomainAbortJob calling qemuMonitorMigrateCancel in qemuDomainAbortJob,
> > we
> > may count on that and check priv.job.asyncAbort before entering the monitor
> > to tell QEMU to start migrating in qemuMigrationRun. If the abort is not
> > requested by then, it may only happen after we call qemuMonitorMigrateTo*
> > and
> > in that case the migration will be properly aborted by
> > qemuMonitorMigrateCancel.
> >
>
> Not really. The issue I've seen was like this:
>
> Thread A Thread B
> 1. start async migration out job
> 2. Since job is set, abort it by
> calling 'migrate_cancel'
> 3. return to user
> 4. issue 'migrate' on the monitor
>
> And I think your suggestion makes it just harder to hit this race and not
> really avoid it.
> However with my code, we are guaranteed that 'migrate_cancel' will have an
> effect.
Oh right, we actually need to check priv.job.asyncAbort after entering the
monitor (since we may be preempted while waiting for entering the monitor) but
still before calling qemuMonitorMigrateTo* that actually starts the monitor.
When we entered the monitor, the AbortJob must have already been there or it
will come after we leave the monitor.
Jirka
--
libvir-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/libvir-list