Not lgtm. The number of CPU cores available on the default ec2 bootstrap machine is 1.
On Tue, Oct 29, 2013 at 5:07 PM, John Arbash Meinel <[email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Do we want to enable multiprocessing for Jujud? I have some evidence > that it would actually help things. > > I'm soliciting feedback about this patch: > === modified file 'cmd/jujud/main.go' > - --- cmd/jujud/main.go 2013-09-13 14:48:13 +0000 > +++ cmd/jujud/main.go 2013-10-28 17:47:52 +0000 > @@ -8,6 +8,7 @@ > "net/rpc" > "os" > "path/filepath" > + "runtime" > > "launchpad.net/juju-core/cmd" > "launchpad.net/juju-core/worker/uniter/jujuc" > @@ -107,6 +108,7 @@ > func Main(args []string) { > var code int = 1 > var err error > + runtime.GOMAXPROCS(runtime.NumCPU()) > commandName := filepath.Base(args[0]) > if commandName == "jujud" { > code, err = jujuDMain(args) > > I'm not sure exactly how we want to spell it, but this *does* help > when scaling up the jujud on machine-0. > > While doing my "create 5000 connections and then restart jujud" it > turns out that the time it takes to get back to a sane state is > actually CPU limited, and jujud is capable of using all 4 cores on my VM. > > I can see that we might only want this on state server nodes, because > on other machines agents might be competing for resources and we want > to make sure the agents aren't saturating the machine. > > FWIW, I tried the test I was doing in Burlingame and with the root > machine being an m1.xlarge and the above patch it doesn't get 'hung' > like the m1.small did. > > With 6000 Units of Ubuntu-1 running, I did "restart jujud-machine-0" > and it took 23 minutes before the log went quiet again. During this > time, jujud generated 1.6M lines of log file (285MB). I have a > machine-0.log.gz but it is 160MB compressed (2.4GB uncompressed). > > So my current guess about my m1.small test is that we just saturated > the 1 CPU that the system had to work with, and that wasn't giving any > cycles to mongodb to actually answer the requests that were coming in. > > We do end up with 2.2G with 5429 active connection (machine-2's > machine agent was down for a long time and I couldn't even ssh into > the machine [the terminal would just hang], it did come back after > another 30min or so, but then it just spun indefinitely because there > was a corrupt file in the .git checkout: > > error: object file > .git/objects/53/94dcc08c1ae1519b87bc994640e9f6c5c7295c is empty > fatal: loose object 5394dcc08c1ae1519b87bc994640e9f6c5c7295c (stored > in .git/objects/53/94dcc08c1ae1519b87bc994640e9f6c5c7295c) is corrupt > > And it was using 7GB+ on disk, and there were a *lot* of > /var/log/juju/tools/unpacking-* directories. > > I'm curious what the story is if you have a machine that is just > broken and how to bring it back to life, though I don't think a > standard use case for us is to have 800 units on one machine :). > > Anyway, I feel a bit better that my scale testing was only really > failing because we were on an m1.small. > > John > =:-> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.13 (Cygwin) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlJvUKMACgkQJdeBCYSNAAN0uQCg0oTn1esz7PXp4o3zw8RwP1Wh > hfAAoMvm95V6ND9cKLuWp/gkezzdCBmo > =KPZ+ > -----END PGP SIGNATURE----- > > -- > Juju-dev mailing list > [email protected] > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev -- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
