On 18/12/14 11:24, Caio Begotti wrote: > Folks, I just wanted to share my experience with Juju during the last > few months using it for real at work. I know it's pretty long but stay > with me as I wanted to see if some of these points are bugs, design > decisions or if we could simply to talk about them :-) > > General: > > 1. Seems that if you happen to have more than... say, 30 machines, Juju > starts behaving weirdly until you remove unused machines. One of the > weird things is that new deploys all stay stuck with a pending status. > That happened at least 4 times, so now I always destroy-environment when > testing things just in case. Have anyone else seen this behaviour? Can > this because of LXC with Juju local? I do a lot of Juju testing so it's > not usual for me to have a couple hundreds of machines after a mont by > the way.
I'll answer this one now. This is due to "not enough file handles". It seems that the LXC containers that get created inherit the handles of the parent process, which is the machine agent. After a certain number of machines, and it may be around 30, the new machines start failing to recognise the new upstart script because inotify isn't working properly. This means the agents don't start, and don't tell the state server they are running, which means the machines stay pending even though lxc says "yep you're all good". I'm not sure how big we can make the "limit nofile" in the agent upstart script without it causing problems elsewhere. Tim -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju