On Tue, Jan 25, 2011 at 02:43:43PM +0800, Wen Congyang wrote:
> The reason of libvirtd cores dump is that:
> We add vm->refs when we alloc the memory, and decrease it
> in the function qemuHandleMonitorEOF() in other thread.
>
> We add vm->refs in the function qemuConnectMonitor() and
> decrease it when the vm is inactive.
>
> The libvirtd will block in the function qemuMonitorSetCapabilities()
> because the vm is stopped by signal SIGSTOP. Now the vm->refs is 2.
>
> Then we kill the vm by signal SIGKILL. The function
> qemuMonitorSetCapabilities() failed, and then we will decrease vm->refs
> in the function qemuMonitorClose().
> In another thread, mon->fd is broken and the function
> qemuHandleMonitorEOF() is called.
>
> If qemuHandleMonitorEOF() decreases vm->refs before qemuConnectMonitor()
> returns, vm->refs will be decrease to 0 and the memory is freed.
>
> We will call qemudShutdownVMDaemon() as qemuConnectMonitor() failed.
> The memory has been freed, so qemudShutdownVMDaemon() is too dangerous.
>
> We will reference NULL pointer in the function
> virDomainConfVMNWFilterTeardown():
> =============
> void
> virDomainConfVMNWFilterTeardown(virDomainObjPtr vm) {
> int i;
>
> if (nwfilterDriver != NULL) {
> for (i = 0; i < vm->def->nnets; i++)
> virDomainConfNWFilterTeardown(vm->def->nets[i]);
> }
> }
> ============
> vm->def->nnets is not 0 but vm->def->nets is NULL(We don't set vm->def->nnets
> to 0 when we free vm).
>
> We should add an extra reference of vm to avoid vm to be deleted if
> qemuConnectMonitor() failed.
>
> Signed-off-by: Wen Congyang <[email protected]>
>
> ---
> src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++---------
> 1 files changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 34cc29f..613c2d4 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -930,6 +930,10 @@ qemuReconnectDomain(void *payload, const char *name
> ATTRIBUTE_UNUSED, void *opaq
>
> priv = obj->privateData;
>
> + /* Hold an extra reference because we can't allow 'vm' to be
> + * deleted if qemuConnectMonitor() failed */
> + virDomainObjRef(obj);
> +
> /* XXX check PID liveliness & EXE path */
> if (qemuConnectMonitor(driver, obj) < 0)
> goto error;
> @@ -961,18 +965,27 @@ qemuReconnectDomain(void *payload, const char *name
> ATTRIBUTE_UNUSED, void *opaq
> if (obj->def->id >= driver->nextvmid)
> driver->nextvmid = obj->def->id + 1;
>
> - virDomainObjUnlock(obj);
> + if (virDomainObjUnref(obj) > 0)
> + virDomainObjUnlock(obj);
> return;
>
> error:
> - /* We can't get the monitor back, so must kill the VM
> - * to remove danger of it ending up running twice if
> - * user tries to start it again later */
> - qemudShutdownVMDaemon(driver, obj, 0);
> - if (!obj->persistent)
> - virDomainRemoveInactive(&driver->domains, obj);
> - else
> - virDomainObjUnlock(obj);
> + if (!virDomainObjIsActive(obj)) {
> + if (virDomainObjUnref(obj) > 0)
> + virDomainObjUnlock(obj);
> + return;
> + }
> +
> + if (virDomainObjUnref(obj) > 0) {
> + /* We can't get the monitor back, so must kill the VM
> + * to remove danger of it ending up running twice if
> + * user tries to start it again later */
> + qemudShutdownVMDaemon(driver, obj, 0);
> + if (!obj->persistent)
> + virDomainRemoveInactive(&driver->domains, obj);
> + else
> + virDomainObjUnlock(obj);
> + }
> }
On closer examination I see why this change is required.
Normally we would be doing qemuDomainObjBeginJob before
doing anything with the monitor and that grabs an extra
reference.
ACK
Daniel
--
libvir-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/libvir-list