Mesos is in kind of an odd state right now because Mesos is well ahead of Kubernetes as far as features goes, but Marathon feels behind in many ways, especially if looking at GKE (programatic configuration of multiple pods and persistent disk transfer come to mind).
A major pain point we're trying to figure out how to solve for Mesos is how to get the logging and system metric monitoring distributed around the cluster. For example, every node needs to be able to have its journald logs exported somewhere at least for kernel, and figuring out what that infrastructure looks like is non-trivial (we export to SumoLogic), and figuring out how to run it in a mesos / container friendly way is even less trivial. In a similar vein, monitoring things like total cpu usage, total network throughput, disk queue depth, and other system level metrics requires some sort of measurement agent on all the machines. Figuring out how to control that in a container friendly and properly isolated way is also non-trivial. A pluggable "system metrics" snapshot taker that can be exposed as an extra field in the agent's /monitor/statistics (or similar) would be fantastic. With regards to metrics and measuring of containers, having the Mesos agents register with all cgroups that they can expose through /monitor/statistics just for executor tracking purposes would be an awesome feature to implement. Cheers, Charles Allen On Tue, Sep 19, 2017 at 1:13 PM Benjamin Mahler <bmah...@apache.org> wrote: > Thanks for sharing this Charles, do you have any feedback (positive or > negative) or feature requests for the Mesos project that you want to > highlight on the list? > > On Fri, Sep 15, 2017 at 6:37 PM, Charles Allen < > charles.al...@metamarkets.com> wrote: > > > Just fyi, a post went live that talks a bit about how we use Mesos at > > Metamarkets to run Druid and Spark on the same machines. > > > > https://metamarkets.com/2017/druid-and-spark-together- > > mixing-analytics-workflows/ > > > > Cheers, > > Charles Allen > > >