On Thu, Apr 24, 2025 at 02:07:17PM +0100, Daniel P. Berrangé wrote:
> 
> On Thu, Apr 24, 2025 at 01:22:46PM +0200, Felix Huettner via Devel wrote:
> > In case of a host that has a large number of cpus offline the count of
> > host cpus and the last bit set in the virHostCPUGetOnlineBitmap might
> > diverge significantly. This can e.g. be the case when disabeling smt via
> > /sys/devices/system/cpu/smt/control.
> >
> > On the host this looks like:
> > ```
> > $ cat /sys/devices/system/cpu/present
> > 0-255
> > $ cat /sys/devices/system/cpu/online
> > 0-127
> > ```
> >
> > However in this case virBitmapToData previously only allocated 16 bytes
> > for the output bitmap. This is becase the last set bit is on the 15th
> > byte.
> >
> > Users of virHostCPUGetMap however rely on the "cpumap" containing enough
> > space for all existing cpus (so they would expect 32 bytes in this case).
> > E.g. cmdNodeCpuMap relies on this for its output. It will then actually
> > read 32 bytes from the start of the "cpumap" address where in this case
> > the last 16 of these bytes are uninitialized.
> >
> > This manifests itself in flapping outputs of "virsh nodecpumap --pretty" 
> > like:
> > ```
> > $ virsh nodecpumap --pretty
> > CPUs present:   256
> > CPUs online:    128
> > CPU map:        0-127,192,194,202
> >
> > $ virsh nodecpumap --pretty
> > CPUs present:   256
> > CPUs online:    128
> > CPU map:        0-127,192,194,197
> >
> > $ virsh nodecpumap --pretty
> > CPUs present:   256
> > CPUs online:    128
> > CPU map:        0-127,192,194,196-197
> > ```
> >
> > This in turn potentially causes users of this data to report wrong cpu
> > counts.
> >
> > Note that this only seems to happen with at least 256 phyiscal cpus
> > where at least 128 are offline.
> >
> > We fix this by preallocating the expected bitmap size.
> >
> > Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud>
> > ---
> >  src/util/virhostcpu.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/util/virhostcpu.c b/src/util/virhostcpu.c
> > index 5dbcc8987c..626faa88cf 100644
> > --- a/src/util/virhostcpu.c
> > +++ b/src/util/virhostcpu.c
> > @@ -1091,22 +1091,26 @@ virHostCPUGetMap(unsigned char **cpumap,
> >  {
> >      g_autoptr(virBitmap) cpus = NULL;
> >      int ret = -1;
> > -    int dummy;
> >
> >      virCheckFlags(0, -1);
> >
> > +    ret = virHostCPUGetCount();
> 
> This sets 'ret' to a positive value....
> 
> > +
> >      if (!cpumap && !online)
> > -        return virHostCPUGetCount();
> > +        return ret;
> >
> >      if (!(cpus = virHostCPUGetOnlineBitmap()))
> >          goto cleanup;
> 
> ...in the failure scenario we now jump to 'cleanup'
> with 'ret' still positive.
> 
> Better to use a different pattern with separate variables
> 
>   int ret = -1;
>   int ncpus = virHostCPUGetCount();
> 
>    ...do stuff...
> 
>   ret = ncpus;
>  cleanup:
>   if (ret < 0)
>     ...
>   return ret;

Hi Daniel,

thanks for the review.

I will post a fixed v2.
I decided to get rid of the "goto cleanup" completely and just return -1
instead. "cleanup" seems to only have existed to deallocate cpumap, but
by the time we jump there cpumap can not possibly be allocated.
So i hope this makes it more easy to read.

Thanks a lot,
Felix

> 
> >
> > -    if (cpumap)
> > -        virBitmapToData(cpus, cpumap, &dummy);
> > +    if (cpumap) {
> > +        int len = (ret + CHAR_BIT) / CHAR_BIT;
> > +        *cpumap = g_new0(unsigned char, len);
> > +        virBitmapToDataBuf(cpus, *cpumap, len);
> > +    }
> > +
> >      if (online)
> >          *online = virBitmapCountBits(cpus);
> >
> > -    ret = virHostCPUGetCount();
> >
> >   cleanup:
> >      if (ret < 0 && cpumap)
> >
> > base-commit: c5a73f75bc6cae4f466d0a6344d5b3277ac9c2f4
> > --
> > 2.43.0
> >
> 
> With regards,
> Daniel
> --
> |: https://berrange.com/
> |: https://libvirt.org/
> |: https://entangle-photo.org/
> 

Reply via email to