On 6/16/26 15:57, Gregory Price wrote:
> On Tue, Jun 16, 2026 at 02:33:43PM +0200, David Hildenbrand (Arm) wrote:
>> On 5/13/26 18:50, Gregory Price wrote:
>>>
>>> The pull model remains available and is the default.
>>
>> I don't quite see the big benefit here, really: either it's a timer in the
>> hypervisor or a timer in the VM. A slow VM will, in either model, delay the
>> update of stats.
>>
>> If you need some "liveness detection", is virtio-balloon stats updates really
>> the right mechanism?
>>
>> I don't quite understand the "Latency-sensitive consumers" problem. If the 
>> VM is
>> slow, it is slow and will mess with latency-sensitive consumers in either 
>> way?
>>
> 
> Latency sensitive here should probably be defined as "Does not like
> blocking operations".  This was prototyped in the context of
> cloud-hypervisor [1] and an orchestrator trying poll 1000 VMs on a
> single machine for stats. 
> 
> The poller couldn't determine the difference between "guest is slow" and
> "guest is hung" and so had to block on the operation (I didn't see how
> to solve this async).
> 
> Similarly, having a single thread just round-robin poll the VMs is
> bluntly inefficient and provides poor guarantees about the liveliness
> of the stats (a couple slow guests can cause other guests' stats to
> become stale for 10s of seconds).
> 
> Definitely an RFC here because I'm not sure if I was missing something
> that might help me solve the problem.

Well, in QEMU we just run a timer internally that does the polling.

Then, upper layers in the stack can ask QEMU for the latest stats.

There, you just get the stats along with a "last-update" timestamp.

-- 
Cheers,

David

Reply via email to