Re: [openstack-dev] Announce of Rally - benchmarking system for OpenStack

Sean Dague Mon, 21 Oct 2013 07:30:49 -0700

On 10/20/2013 02:36 PM, Alex Gaynor wrote:

There's several issues involved in doing automated regression checking
for benchmarks:


- You need a platform which is stable. Right now all our CI runs on
virtualized instances, and I don't think there's any particular
guarantee it'll be the same underlying hardware, further virtualized
systems tend to be very noisy and not give you the stability you need.
- You need your benchmarks to be very high precision, if you really want
to rule out regressions of more than N% without a lot of false positives.
- You need more than just checks on individual builds, you need long
term trend checking - 100 1% regressions are worse than a single 50%
regression.

Alex

Agreed on all these points. However I think non of them change where theload generation scripts should be developed.

They mostly speak to ensuring that we've got a repeatable hardwareenvironment for running the benchmark, and that we've got the right kindof data collection and analysis to make it stastically valid.

Point #1 is hard - as it really does require bare metal. But lets putthat asside for now, as I think there might be clouds being madeavailable that we could solve that.

But the rest of this is just software. If we had performance meteringavailable in either the core servers or as part of Tempest we could getappropriate data. Then you'd need a good statistics engine to providestatisically relevant processing of that data. Not just line graphs, butreal error bars and confidence intervals based on large numbers of runs.I've seen way too many line graphs arguing one point or another aboutconfig changes that turns out have error bars far beyond the resultsthat are being seen. Any system that doesn't expose that isn't reallygoing to be useful.

Actual performance regressions are going to be *really* hard to find inthe gate, just because of the rate of code change that we have, and thevariability we've seen on the guests.

Honestly, the statistics engine that actually just took in our existinglarge sets of data and got baseline variability would be a great stepforward (that's new invention, no one has that right now). I'm sure wecan figure out a good way to take the load generation into Tempest to beconsistent with our existing validation and scenario tests. The meteringcould easily be proposed as a nova extension (ala coverage). And thatseems to leave you with a setup tool, to pull this together in arbitraryenvironments.

And that's really what I mean about integrating better. Wheneverpossible figuring out how functionality could be added to existingprojects, especially when that means they are enhanced not only for youruse case, but for other use cases that those projects have wanted for awhile (seriously, I'd love to have statistically valid run timestatistics for tempest that show us when we go off the rails, like wedid last week for a few days, and quantify long term variability andtrends in the stack). It's harder in the short term to do that, becauseit means compromises along the way, but the long term benefit toOpenStack is much greater than another project which duplicates effortfrom a bunch of existing projects.


        -Sean

--
Sean Dague
http://dague.net

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Announce of Rally - benchmarking system for OpenStack

Reply via email to