Scale testing analysis

Tim Penhey Mon, 22 May 2017 20:54:07 -0700

Hi folks,

We had another scale test today to analyse why the controller CPU usagedidn't fall away as expected when the models were removed.

I'll be filing a bunch of bugs from the analysis process, but there isone bug that is, I believe, the culprit for the high CPU usage.

Interestingly enough, Juju developers were not able to reproduce theproblem with smaller deployments. The scale that we were testing was 140models each with 10 machines and about 20 total units.

During the teardown process of the testing, all models were destroyed atonce.

We have most of the responsive nature of Juju is driven off thewatchers. These watchers watch the mongo oplog for document changes.What happened was that there were so many mongo operations, the cappedcollection of the oplog was completely replaced between our polledwatcher delays. The watchers then errored out in a new unexpected way.

Effectively the watcher infrastructure needs an internal reset buttonthat it can hit when this happens that invalidates all the watchers.This should cause all the workers to be torn down and restarted from aknown good state.

There was a model that got stuck being destroyed, this is tracked backto a worker that should be doing the destructions not noticing.

All the CPU usage can be tracked back to the 139 models in the apiserverstate pools each still running leadership and base watcher workers. Thestate pool should have removed all these instances, but it didn't noticethey were gone.

There are some other bugs around logging things as errors that reallyaren't errors that contributed to log noise, but the fundamental errorhere is not being robust in the face of too much change at once.

This needs to be fixed for the 2.2 release candidate, so it may wellpush that out past the end of this week.


Tim

--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Scale testing analysis

Reply via email to