On Thu, Nov 08, 2012 at 11:00:40 +0100, Michal Privoznik wrote:
> On 07.11.2012 23:23, Jiri Denemark wrote:
> > 
> > AbortJob API was never a synchronous one and I don't think we need to change
> > it. Not to mention that this code unlocks and locks vm without holding an
> > extra reference to it. And even if it did so, it cannot guarantee that the 
> > job
> > it's checking in the loop is the same one it tried to cancel. It's quite
> > unlikely but the original job could have finished and another one could have
> > been started while this code was sleeping.
> 
> However, AbortJob is documented as synchronous. If current implementation 
> doesn't
> follow docs it is a bug. On the other hand, I don't recall anybody screaming 
> about
> it so far. But that means nothing, right?

IIRC the implementation was never synchronous in which case I think we may
just fix the documentation to match reality :-)

> >> @@ -1172,6 +1172,12 @@ qemuMigrationUpdateJobStatus(struct qemud_driver 
> >> *driver,
> >>      }
> >>      priv->job.info.timeElapsed -= priv->job.start;
> >>  
> >> +    if (qemuDomainObjAbortAsyncJobRequested(vm)) {
> >> +        VIR_DEBUG("Migration abort requested. Translating "
> >> +                  "status to MIGRATION_STATUS_CANCELLED");
> >> +        status = QEMU_MONITOR_MIGRATION_STATUS_CANCELLED;
> >> +    }
> >> +
> > 
> > This check seems to be to late and that complicates the code later. If we 
> > keep
> > qemuDomainAbortJob calling qemuMonitorMigrateCancel in qemuDomainAbortJob, 
> > we
> > may count on that and check priv.job.asyncAbort before entering the monitor
> > to tell QEMU to start migrating in qemuMigrationRun. If the abort is not
> > requested by then, it may only happen after we call qemuMonitorMigrateTo* 
> > and
> > in that case the migration will be properly aborted by
> > qemuMonitorMigrateCancel.
> > 
> 
> Not really. The issue I've seen was like this:
> 
>    Thread A                                   Thread B
> 1. start async migration out job
> 2.                                            Since job is set, abort it by 
> calling 'migrate_cancel'
> 3.                                            return to user
> 4. issue 'migrate' on the monitor
> 
> And I think your suggestion makes it just harder to hit this race and not 
> really avoid it.
> However with my code, we are guaranteed that 'migrate_cancel' will have an 
> effect.

Oh right, we actually need to check priv.job.asyncAbort after entering the
monitor (since we may be preempted while waiting for entering the monitor) but
still before calling qemuMonitorMigrateTo* that actually starts the monitor.
When we entered the monitor, the AbortJob must have already been there or it
will come after we leave the monitor.

Jirka

--
libvir-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to