If it is overcommit, this go issue will make for interesting (but possibly not helpful) reading: https://github.com/golang/go/issues/5838
On 3 June 2015 at 17:24, Gustavo Niemeyer <[email protected]> wrote: > Hey John, > > It's probably an overcommit issue. Even if you don't have the memory in use, > cloning it would mean the new process would have a chance to change that > memory and thus require real memory pages, which the system obviously cannot > give it. You can workaround that by explicitly enabling overcommit, which > means the potential to crash late in strange places in the bad case, but > would be totally okay for the exec situation. > > So we're running into this failure mode again at one of our sites. > > Specifically, the system is running with a reasonable number of nodes (~100) > and has been running for a while. It appears that it wanted to restart > itself (I don't think it restarted jujud, but I do think it at least > restarted a lot of the workers.) > Anyway, we have a fair number of things that we "exec" during startup > (kvm-ok, restart rsyslog, etc). > But when we get into this situation (whatever it actually is) then we can't > exec anything and we start getting failures. > > Now, this *might* be a golang bug. > > When I was trying to debug it in the past, I created a small program that > just allocated big slices of memory (10MB strings, IIRC) and then tried to > run "echo hello" until it started failing. > IIRC the failure point was when I wasn't using swap and the allocated memory > was 50% of total available memory. (I have 8GB of RAM, it would start > failing once we had allocated 4GB of strings). > When I tried digging into the golang code, it looked like they use clone(2) > as the "create a new process for exec" function. And it seemed it wasn't > playing nicely with copy-on-write. At least, it appeared that instead of > doing a simple copy-on-write clone without allocating any new memory and > then exec into a new process, it actually required to have enough RAM > available for the new process. > > On the customer site, though, jujud has a RES size of only 1GB, and they > have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in > use). > > The only workaround I can think of is for us to create a "forker" process > right away at startup that we just send RPC requests to run a command for us > and return the results. ATM I don't think we do any fork and run > interactively such that we need the stdin/stdout file handles inside our > process. > > I'd rather just have golang fork() work even when the current process is > using a large amount of RAM. > > Any of the golang folks know what is going on? > > John > =:-> > > > -- > Juju-dev mailing list > [email protected] > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev > > > -- > Juju-dev mailing list > [email protected] > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev > -- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
