On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote: > On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkre...@redhat.com> wrote: > > > On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.hu...@smartx.com wrote: > > > From: Hyman Huang <yong.hu...@smartx.com> > > > > > > When saving the domain status in the path of the virDomainObjSave > > > function, Libvirtd will omit the private data if the format > > > interface is not provided. > > > > > > When the Libvirtd service is starting up and the driver has not > > > yet been initialized, this behavior is justified. > > > > > > However, when the service is shutting down, the driver is being > > > finalized and the interface is being released, a migration job > > > or another job may call the qemuDomainSaveStatus and save the > > > domain status at the same time. For the latter, this behavior > > > > I had another look and `virQEMUDriverPrivateDataCallbacks` is a global > > variable in the qemu driver which is just populated and never changed. > > > > This struct is then copied (by assignment) into the virDomainXMLOption > > instance that is used inside the qemu driver. Thus while this does have > > a private copy of the function pointers. The virDomainXMLOption > > instance lives the whole time the virt drivers live, so it's only ever > > cleared in virStateCleanup, which in the daemons happens right before > > exit. > > > > Now looking at the formatter code (let's use your patch as an example): > > > > --- > > src/conf/domain_conf.c | 21 ++++++++++++++++++--- > > 1 file changed, 18 insertions(+), 3 deletions(-) > > > > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c > > index 281846dfbe..74af08e584 100644 > > --- a/src/conf/domain_conf.c > > +++ b/src/conf/domain_conf.c > > @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, > > obj->deprecations[i]); > > } > > =20 > > - if (xmlopt->privateData.format && > > - xmlopt->privateData.format(&buf, obj) < 0) > > - return NULL; > > + if (xmlopt->privateData.format) { > > + if (xmlopt->privateData.format(&buf, obj) < 0) > > > > Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside > > of XMLopt you have only pointers that don't change. > > > > As of such the 'format' callback being NULL is only if something > > overwrote the data inside 'xmlopt', which we don't do anywhere because > > it doesn't make sense. > > > > The other possibility is that 'xmlopt' is used after free, thus was > > randomly overwritten, but that would be a much different kind of bug and > > your fix wouldn't fix that at all. > > > > > > > causes the XML to be saved without private information (such as > > > monitor path and qemuCaps), which is required for the Libvirtd > > > service to manage a VM during startup. As a result, after restarting > > > the Libvirtd service, a running VM that is being migrated previously > > > might escape management. > > > > > > Thus, when formatting the status XML for a running virtual machine, > > > we need to presume that the "privateData.format" interface is present. > > > > > > Add a thorough check in the virDomainObjSave path to make sure that > > > private data in the status XML file always exists for the running VM > > > so that we won't lose it after restart the Libvirtd service. > > > > Now this outlines the implications of losing the private data but > > doesn't really give any information how this happened. > > > > Can you please clarify: > > 1) What did you do to trigger this bug? What was the state of the host? > > 2) How did you figure out that what you claim happened? > > 3) Can you reproduce the bug? > > > > If you can reproduce the bug please make sure to collect debug logs of > > libvirtd, and potentially consider running the daemon inside e.g. > > valgrind to see if it reproduces with such heavy instrumentation. > > > > > 1. Add a debuginfo print patch on Libvirtd, if privateData.format does not > exist, log it: > > --- a/src/conf/domain_conf.c > +++ b/src/conf/domain_conf.c > @@ -30186,6 +30186,11 @@ virDomainObjFormat(virDomainObjPtr obj, > xmlopt->privateData.format(&buf, obj) < 0) > return NULL; > > + if (!xmlopt->privateData.format) { > + VIR_WARN("PrivateData formatter driver does not exist"); > + } > + > > 2. Add a private patch(Not the complete code): > > Stop process if malformed config found > > Malformed material may prevent the virtual machine from > loading in its live state, according to the configuration > XML. If such a scenario was found by the Libvirt service > during startup, report an error and terminate.
Okay I had a deeper look and I now understand what you wanted to do by this code. It's also completely irelevant to the use-after-free bug. Note that we do not want this behaviour, because it prevents the whole deamon from starting. We need to fix the root cause rather than cover up. Losing a VM is obviously very bad, but fixing the root cause for it is what we should focus on. Please provide the unabreviated debug log and/or a *full* backtrace of all threads if you can attach a debugger to the daemon and setup a breakpoint at the point where the formatting of XML would use the freed data.