This is great, I would love to see the metrics collection be low latency. Here is the response I gave last time, with the 4 options I think we should tackle: https://www.mail-archive.com/dev@mesos.apache.org/msg37113.html
IMHO we should defer looking into the 5th option of sampling/caching internally until we've thoroughly attempted to avoid tripping through the Process queue, because if we succeed it will become unnecessary. On Mon, May 22, 2017 at 1:26 PM, Zhitao Li <zhitaoli...@gmail.com> wrote: > Thanks for the feedback, James. > > Replying to your points inline: > > On Mon, May 22, 2017 at 10:56 AM, James Peach <jor...@gmail.com> wrote: > > > > > > On May 19, 2017, at 11:35 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: > > > > > > Hi, > > > > > > I'd like to start a conversation to talk about metrics collection > > endpoints > > > (especially `/metrics/snapshot`) behavior. > > > > > > Right now, these endpoints are served from the same master/agent's > > > libprocess, and extensively uses `gauge` to chain further callbacks to > > > collect various metrics (DRF allocator specifically adds several > metrics > > > per role). > > > > > > This brings a problem when the system is under load: when the > > > master/allocator libprocess becomes busy, stats collection itself > becomes > > > slow too. Flying dark when the system is under load is specifically > > painful > > > for an operator. > > > > Yes, sampling metrics should approach zero cost. > > > > > I would like to explore the direction of isolating metric collection > even > > > when the master is slow. A couple of ideas: > > > > > > - (short term) reduce usage of gauge and prefer counter (since I > believe > > > they are less affected); > > > > I'd rather not squash the semantics for performance reasons. If a metric > > has gauge semantics, I don't think we should represent that as a Counter. > > > > I recall that I had a previous conversation with @bmahler and he thought > that certain gauges could be expresses as the differential of two counters. > > We definitely cannot express a gauge as a Counter, because gauge value can > decrease while counter should always be treated as monotonically increasing > until process restart. > > > > > > > - alternative implementation of `gauge` which does not contend on > > > master/allocator's event queue; > > > > This is doable in some circumstances, but not always. For example, > > Master::_uptime_secs() doesn't need to run on the master queue, but > > Master::_outstanding_offers arguably does. The latter could be > implemented > > by sampling an variable that is updated, but that's not very generic, so > we > > should try to think of something better. > > > > I agree that this will not be trivial cut. The fact that this is not > something trivially achievable is the primary I want to start this > conversation with the general dev community. We can absorb some work to > optimize on certain hot paths (I suspect roles specific ones in allocator > being one of them for us), but maintaining this in the long term will > definitely requires all contributors to help. > > w.r.t. examples, it seems that Master::_outstanding_offers simply calls a > hashmap::size() on a hashmap object, so if the underlying container type > conforms to C++11's thread safe requirement ( > http://en.cppreference.com/w/cpp/container#Thread_safety), we should at > least be able to call the size() function with the understanding that we > might get slightly stale value? > > I think a more interesting example is Master::_task_starting(): not only > that is is not calculated from a simple const method,but also that the > result is actually generated by iterating on all tasks registered to the > master. This means the cost of calculating this is linear to number of > tasks in the cluster. > > > > > > > > - serving metrics collection from a different libprocess routine. > > > > See MetricsProcess. One (mitigation?) approach would be to sample the > > metrics at a fixed rate and then serve the cached samples from the > > MetricsProcess. I expect most installations have multiple clients > sampling > > the metrics, so this would at least decouple the sample rate from the > > metrics request rate. > > > > That sounds a very good idea to start with. I still think certain code > should be augmented so they maintain the gauge value more efficiently, but > for whatever code which is harder to rewrite this can definitely improve > the situation. > > > > > > > > > > Any thoughts on these? > > > > > > -- > > > Cheers, > > > > > > Zhitao Li > > > > > > > -- > Cheers, > > Zhitao Li >