Re: monitoring mesos master load

2018-10-12 Thread Benjamin Mahler
The following are probably what you're looking for: https://issues.apache.org/jira/browse/MESOS-9237 https://issues.apache.org/jira/browse/MESOS-9236 On Fri, Oct 12, 2018 at 12:02 PM Eric Chung wrote: > Hello devs, > > We recently had an incident where the master was overloaded by the >

monitoring mesos master load

2018-10-12 Thread Eric Chung
Hello devs, We recently had an incident where the master was overloaded by the scheduler's ACKNOWLEDGE requests, causing the http api latencies to spike. I have two questions: - what is the best way to instrument the http api to emit latency metrics? - what's the best way to monitor the master's

Re: Mesos Flakiness Statistics

2018-10-12 Thread Benjamin Mahler
Thanks for sending this Benno! I for one would love to see more regular communication about the state of CI, especially so that I know how I can help fix tests (right now I don't know which flaky tests are in areas I am maintaining). Is there any reason the first portion of the test name is being

Mesos Flakiness Statistics

2018-10-12 Thread Benno Evers
Hey all, as you might know, we've set up an internal CI system that is running `make check` on a variety of different platforms and configurations, 16 in total. As we've experienced more and more pain maintaining a green master, I've compiled some statistics about which tests are most flaky. I