Hi, @zhitao

> the `/metrics/snapshot` could take 10-30 seconds to respond.

Do you mean it `/metrics/snapshot` return result after 10~30 seconds?
Or `/metrics/snapshot` takes 10~30 seconds to reflect the change of `
allocator/mesos/event_queue_dispatches gauge`?

On Mon, Dec 19, 2016 at 1:11 PM, Zhitao Li <zhitaoli...@gmail.com> wrote:

> Hi all,
>
> While I was debugging an allocator message queue build up issue on master
> (which I plan to share another thread), I noticed that `/metrics/snapshot`
> is also badly affected.
>
> For example, when the allocator queue has ~3k dispatches in it (revealed by
> the allocator/mesos/event_queue_dispatches gauge), the `/metrics/snapshot`
> could take 10-30 seconds to respond.
>
> During an active debugging or outage fighting, this is pretty undesired.
>
> My guess is that many stats collection code relies on *deferring* to
> another libprocess and collect the result.
>
> Should we explore a more reliable way to track metrics independently from
> libprocess's queue?
>
> --
> Cheers,
>
> Zhitao Li
>



-- 
Best Regards,
Haosdent Huang

Reply via email to