On 6/16/26 15:57, Gregory Price wrote: > On Tue, Jun 16, 2026 at 02:33:43PM +0200, David Hildenbrand (Arm) wrote: >> On 5/13/26 18:50, Gregory Price wrote: >>> >>> The pull model remains available and is the default. >> >> I don't quite see the big benefit here, really: either it's a timer in the >> hypervisor or a timer in the VM. A slow VM will, in either model, delay the >> update of stats. >> >> If you need some "liveness detection", is virtio-balloon stats updates really >> the right mechanism? >> >> I don't quite understand the "Latency-sensitive consumers" problem. If the >> VM is >> slow, it is slow and will mess with latency-sensitive consumers in either >> way? >> > > Latency sensitive here should probably be defined as "Does not like > blocking operations". This was prototyped in the context of > cloud-hypervisor [1] and an orchestrator trying poll 1000 VMs on a > single machine for stats. > > The poller couldn't determine the difference between "guest is slow" and > "guest is hung" and so had to block on the operation (I didn't see how > to solve this async). > > Similarly, having a single thread just round-robin poll the VMs is > bluntly inefficient and provides poor guarantees about the liveliness > of the stats (a couple slow guests can cause other guests' stats to > become stale for 10s of seconds). > > Definitely an RFC here because I'm not sure if I was missing something > that might help me solve the problem.
Well, in QEMU we just run a timer internally that does the polling. Then, upper layers in the stack can ask QEMU for the latest stats. There, you just get the stats along with a "last-update" timestamp. -- Cheers, David

