Hi Chris,
Perhaps you've run into https://community.nitrous.io/posts/stability-and-a-linux-oom-killer-bug. We ran into similar symptoms that you've described and taking the above as the cause solved all of our issues. Hope this helps! -- Tom Arnfeld Developer // DueDil (+44) 7525940046 25 Christopher Street, London, EC2A 2BS On Mon, Aug 31, 2015 at 11:55 PM, Christopher Ketchum <[email protected]> wrote: > Hi all, > I was running a Mesos cluster on EC2 with c4.8xlarge instance types when > one of the status checks failed. We are running Mesos 0.22.1 on ubuntu > 14.04, with kernel version 3.13.0-55-generic. EC2 gave us this console > output[1]. I did some searching and found similar issues reported here[2] > on lkml, though those logs indicated a specific task and an older kernel, > while these logs just show mesos-slave as the causative process. > Unfortunately, the instance was terminated so I'm not sure how much useful > debugging can be done. Is this a known issue? We are also using a our own > python executor, could an error there have caused this? > [1] http://pastebin.com/NgHi8MnS > [2] https://lkml.org/lkml/2014/9/30/498 > Thanks, > Chris
