On 5/13/26 18:50, Gregory Price wrote:
> When doing aggressive overcommit of VMs on a single host, a pull
> model of stat retrieval is problematic if a guest becomes some form
> of unresponsive.  In particular, it's difficult to discern the
> difference between a hung guest and a slow guest - and why the
> guest is experiencing that.
> 
> Add VIRTIO_BALLOON_F_STATS_PUSH feature that allows the host to
> configure the guest to push stats on a timer instead of the default
> pull model.
> 
> The host sets stats_push_interval_ms in the balloon config space:
>   0     = disabled (pull-only, default)
>   N > 0 = guest pushes stats every N milliseconds
> 
> The push mode reuses the existing stats VQ, same buffer format,
> same tags. The host can change the interval at runtime by updating
> the config field.
> 
> Push mode provides two advantages over pull:
>   1. Guest liveness detection: in pull mode, the host cannot
>      distinguish a slow guest from a hung guest without implementing
>      its own timeout tracking. In push mode, the absence of expected
>      stats buffers is an implicit liveness signal; if the guest
>      fails to push within the expected interval, the host can
>      conclude it is unresponsive.
>   2. Latency-sensitive consumers (e.g., memory pressure response
>      loops) receive fresh stats at a guaranteed cadence without
>      the host needing to poll.
> 
> STATS_PUSH requires STATS_VQ; the driver clears STATS_PUSH during
> feature validation if STATS_VQ is absent. When push mode is active,
> the pull callback is suppressed to avoid racing on buffer submission.
> 
> The pull model remains available and is the default.

I don't quite see the big benefit here, really: either it's a timer in the
hypervisor or a timer in the VM. A slow VM will, in either model, delay the
update of stats.

If you need some "liveness detection", is virtio-balloon stats updates really
the right mechanism?

I don't quite understand the "Latency-sensitive consumers" problem. If the VM is
slow, it is slow and will mess with latency-sensitive consumers in either way?

-- 
Cheers,

David

Reply via email to