Hi, @zhitao > the `/metrics/snapshot` could take 10-30 seconds to respond.
Do you mean it `/metrics/snapshot` return result after 10~30 seconds? Or `/metrics/snapshot` takes 10~30 seconds to reflect the change of ` allocator/mesos/event_queue_dispatches gauge`? On Mon, Dec 19, 2016 at 1:11 PM, Zhitao Li <zhitaoli...@gmail.com> wrote: > Hi all, > > While I was debugging an allocator message queue build up issue on master > (which I plan to share another thread), I noticed that `/metrics/snapshot` > is also badly affected. > > For example, when the allocator queue has ~3k dispatches in it (revealed by > the allocator/mesos/event_queue_dispatches gauge), the `/metrics/snapshot` > could take 10-30 seconds to respond. > > During an active debugging or outage fighting, this is pretty undesired. > > My guess is that many stats collection code relies on *deferring* to > another libprocess and collect the result. > > Should we explore a more reliable way to track metrics independently from > libprocess's queue? > > -- > Cheers, > > Zhitao Li > -- Best Regards, Haosdent Huang