On Thu, Apr 24, 2025 at 01:22:46PM +0200, Felix Huettner via Devel wrote:
> In case of a host that has a large number of cpus offline the count of
> host cpus and the last bit set in the virHostCPUGetOnlineBitmap might
> diverge significantly. This can e.g. be the case when disabeling smt via
> /sys/devices/system/cpu/smt/control.
> 
> On the host this looks like:
> ```
> $ cat /sys/devices/system/cpu/present
> 0-255
> $ cat /sys/devices/system/cpu/online
> 0-127
> ```
> 
> However in this case virBitmapToData previously only allocated 16 bytes
> for the output bitmap. This is becase the last set bit is on the 15th
> byte.
> 
> Users of virHostCPUGetMap however rely on the "cpumap" containing enough
> space for all existing cpus (so they would expect 32 bytes in this case).
> E.g. cmdNodeCpuMap relies on this for its output. It will then actually
> read 32 bytes from the start of the "cpumap" address where in this case
> the last 16 of these bytes are uninitialized.
> 
> This manifests itself in flapping outputs of "virsh nodecpumap --pretty" like:
> ```
> $ virsh nodecpumap --pretty
> CPUs present:   256
> CPUs online:    128
> CPU map:        0-127,192,194,202
> 
> $ virsh nodecpumap --pretty
> CPUs present:   256
> CPUs online:    128
> CPU map:        0-127,192,194,197
> 
> $ virsh nodecpumap --pretty
> CPUs present:   256
> CPUs online:    128
> CPU map:        0-127,192,194,196-197
> ```
> 
> This in turn potentially causes users of this data to report wrong cpu
> counts.
> 
> Note that this only seems to happen with at least 256 phyiscal cpus
> where at least 128 are offline.
> 
> We fix this by preallocating the expected bitmap size.
> 
> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud>
> ---
>  src/util/virhostcpu.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/src/util/virhostcpu.c b/src/util/virhostcpu.c
> index 5dbcc8987c..626faa88cf 100644
> --- a/src/util/virhostcpu.c
> +++ b/src/util/virhostcpu.c
> @@ -1091,22 +1091,26 @@ virHostCPUGetMap(unsigned char **cpumap,
>  {
>      g_autoptr(virBitmap) cpus = NULL;
>      int ret = -1;
> -    int dummy;
>  
>      virCheckFlags(0, -1);
>  
> +    ret = virHostCPUGetCount();

This sets 'ret' to a positive value....

> +
>      if (!cpumap && !online)
> -        return virHostCPUGetCount();
> +        return ret;
>  
>      if (!(cpus = virHostCPUGetOnlineBitmap()))
>          goto cleanup;

...in the failure scenario we now jump to 'cleanup'
with 'ret' still positive.

Better to use a different pattern with separate variables

  int ret = -1;
  int ncpus = virHostCPUGetCount();

   ...do stuff...

  ret = ncpus;
 cleanup:
  if (ret < 0)
    ...
  return ret;

>  
> -    if (cpumap)
> -        virBitmapToData(cpus, cpumap, &dummy);
> +    if (cpumap) {
> +        int len = (ret + CHAR_BIT) / CHAR_BIT;
> +        *cpumap = g_new0(unsigned char, len);
> +        virBitmapToDataBuf(cpus, *cpumap, len);
> +    }
> +
>      if (online)
>          *online = virBitmapCountBits(cpus);
>  
> -    ret = virHostCPUGetCount();
>  
>   cleanup:
>      if (ret < 0 && cpumap)
> 
> base-commit: c5a73f75bc6cae4f466d0a6344d5b3277ac9c2f4
> -- 
> 2.43.0
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Reply via email to