On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote:
> On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkre...@redhat.com> wrote:
> 
> > On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.hu...@smartx.com wrote:
> > > From: Hyman Huang <yong.hu...@smartx.com>
> > >
> > > When saving the domain status in the path of the virDomainObjSave
> > > function, Libvirtd will omit the private data if the format
> > > interface is not provided.
> > >
> > > When the Libvirtd service is starting up and the driver has not
> > > yet been initialized, this behavior is justified.
> > >
> > > However, when the service is shutting down, the driver is being
> > > finalized and the interface is being released, a migration job
> > > or another job may call the qemuDomainSaveStatus and save the
> > > domain status at the same time. For the latter, this behavior
> >
> > I had another look and `virQEMUDriverPrivateDataCallbacks` is a global
> > variable in the qemu driver which is just populated and never changed.
> >
> > This struct is then copied (by assignment)  into the virDomainXMLOption
> > instance that is used inside the qemu driver. Thus while this does have
> > a private copy of the function pointers. The virDomainXMLOption
> > instance lives the whole time the virt drivers live, so it's only ever
> > cleared in virStateCleanup, which in the daemons happens right before
> > exit.
> >
> > Now looking at the formatter code (let's use your patch as an example):
> >
> > ---
> >  src/conf/domain_conf.c | 21 ++++++++++++++++++---
> >  1 file changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> > index 281846dfbe..74af08e584 100644
> > --- a/src/conf/domain_conf.c
> > +++ b/src/conf/domain_conf.c
> > @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj,
> >                                obj->deprecations[i]);
> >      }
> > =20
> > -    if (xmlopt->privateData.format &&
> > -        xmlopt->privateData.format(&buf, obj) < 0)
> > -        return NULL;
> > +    if (xmlopt->privateData.format) {
> > +        if (xmlopt->privateData.format(&buf, obj) < 0)
> >
> > Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside
> > of XMLopt you have only pointers that don't change.
> >
> > As of such the 'format' callback being NULL is only if something
> > overwrote the data inside 'xmlopt', which we don't do anywhere because
> > it doesn't make sense.
> >
> > The other possibility is that 'xmlopt' is used after free, thus was
> > randomly overwritten, but that would be a much different kind of bug and
> > your fix wouldn't fix that at all.
> >
> >
> > > causes the XML to be saved without private information (such as
> > > monitor path and qemuCaps), which is required for the Libvirtd
> > > service to manage a VM during startup. As a result, after restarting
> > > the Libvirtd service, a running VM that is being migrated previously
> > > might escape management.
> > >
> > > Thus, when formatting the status XML for a running virtual machine,
> > > we need to presume that the "privateData.format" interface is present.
> > >
> > > Add a thorough check in the virDomainObjSave path to make sure that
> > > private data in the status XML file always exists for the running VM
> > > so that we won't lose it after restart the Libvirtd service.
> >
> > Now this outlines the implications of losing the private data but
> > doesn't really give any information how this happened.
> >
> > Can you please clarify:
> >  1) What did you do to trigger this bug? What was the state of the host?
> >  2) How did you figure out that what you claim happened?
> >  3) Can you reproduce the bug?
> >
> > If you can reproduce the bug please make sure to collect debug logs of
> > libvirtd, and potentially consider running the daemon inside e.g.
> > valgrind to see if it reproduces with such heavy instrumentation.
> >
> >
> 1. Add a debuginfo print patch on Libvirtd, if privateData.format does not
> exist, log it:
> 
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -30186,6 +30186,11 @@ virDomainObjFormat(virDomainObjPtr obj,
>          xmlopt->privateData.format(&buf, obj) < 0)
>          return NULL;
> 
> +    if (!xmlopt->privateData.format) {
> +        VIR_WARN("PrivateData formatter driver does not exist");
> +    }
> +
> 
> 2. Add a private patch(Not the complete code):
> 
>     Stop process if malformed config found
> 
>     Malformed material may prevent the virtual machine from
>     loading in its live state, according to the configuration
>     XML. If such a scenario was found by the Libvirt service
>     during startup, report an error and terminate.

Okay I had a deeper look and I now understand what you wanted to do by
this code. It's also completely irelevant to the use-after-free bug.

Note that we do not want this behaviour, because it prevents the whole
deamon from starting.

We need to fix the root cause rather than cover up.

Losing a VM is obviously very bad, but fixing the root cause for it is
what we should focus on.

Please provide the unabreviated debug log and/or a *full* backtrace of
all threads if you can attach a debugger to the daemon and setup a
breakpoint at the point where the formatting of XML would use the freed
data.

Reply via email to