Hi everyone, Has anyone worked on crash report tooling for Mesos clusters? As a part of our testing (both internally and for testing of public RC's) been looking at tools we can run to 1) monitor running mesos processes (unusual behavior in use of file descriptors, and cpu load etc) 2) To grab information in a post-mortem manner (crawl endpoints, find the command line arguments processes were started with, machine stats and so on).
Even if different organizations use different tools to do this, it could be awesome to join forces and find a common format. Do you guys have any thoughts or ideas? We could host such a tool as a part of the mesos distribution or just host it in github.com/mesos Cheers, Niklas
