So we're running into this failure mode again at one of our sites. Specifically, the system is running with a reasonable number of nodes (~100) and has been running for a while. It appears that it wanted to restart itself (I don't think it restarted jujud, but I do think it at least restarted a lot of the workers.) Anyway, we have a fair number of things that we "exec" during startup (kvm-ok, restart rsyslog, etc). But when we get into this situation (whatever it actually is) then we can't exec anything and we start getting failures.
Now, this *might* be a golang bug. When I was trying to debug it in the past, I created a small program that just allocated big slices of memory (10MB strings, IIRC) and then tried to run "echo hello" until it started failing. IIRC the failure point was when I wasn't using swap and the allocated memory was 50% of total available memory. (I have 8GB of RAM, it would start failing once we had allocated 4GB of strings). When I tried digging into the golang code, it looked like they use clone(2) as the "create a new process for exec" function. And it seemed it wasn't playing nicely with copy-on-write. At least, it appeared that instead of doing a simple copy-on-write clone without allocating any new memory and then exec into a new process, it actually required to have enough RAM available for the new process. On the customer site, though, jujud has a RES size of only 1GB, and they have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in use). The only workaround I can think of is for us to create a "forker" process right away at startup that we just send RPC requests to run a command for us and return the results. ATM I don't think we do any fork and run interactively such that we need the stdin/stdout file handles inside our process. I'd rather just have golang fork() work even when the current process is using a large amount of RAM. Any of the golang folks know what is going on? John =:->
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
