Wes, The process as I think should be the following. 1. Commit triggers to build in TeamCity. I have set the TeamCity, but we can use whatever CI we would like. 2. TeamCity is using the pool of identical machines to run the predefined (or all) performance benchmarks on one the build machines from the pool. 3. Each benchmark generates output - by using Google Benchmarks we generate JSON format file. 4. The build step in the TeamCity which runs the performance gathers all those files and parses them. 5. For each parsed output it creates an entry in the DB with the commit ID as a key and auxiliary information that can be helpful. 6. The codespeed sitting on top of that Database visualize data in the dashboard by marking regressions as red and progressions as green compared to either baseline which you define or previous commit, as all the commits are ordered in the time. 7. You can create custom queries to compare specific commits or see trends on the timeline.
I am not mandating codespeed or anything else, but we should start with something. We can use something more sophisticated, like Influx. > In the benchmarking one of the hardest parts (IMHO) is the process/workflow > automation. I'm in support of the development of a "meta-benchmarking" > framework that offers automation, extensibility, and possibility for > customization. [>] Meta is good, and I am totally supporting it, but meanwhile we are doing that there is a need for something very simple but usable. > > One of the reasons that people don't do more benchmarking as part of their > development process is that the tooling around it isn't great. > Using a command line tool [1] that outputs unconfigurable text to the terminal > to compare benchmarks seems inadequate to me. [>] I would argue here - it is the minimal config that works with external tooling without creating huge infrastructure around it. We already use Google Benchmark library which provides all the needed output format. And if you do not like CodeSpeed we can use anything else, e.g. Dana (https://github.com/google/dana) from Google. > > In the cited example > > $ benchcmp old.txt new.txt > > Where do old.txt and new.txt come from? I would like to have that detail > (build > of appropriate component, execution of benchmarks and collection of results) > automated. [>]In the case of Go it is: $go test -run=^$ -bench=. ./... > old.txt Then you switch to the new branch and do the same with >new.txt then you do benchcmp and it does the comparison. 3 bash commands. > > FWIW, 7 and a half years ago [2] I wrote a small project called vbench to > assist > with benchmark automation, so this has been a long-term interest of mine. > Codespeed existed in 2011, here is what I wrote about it in December 2011, > and it is definitely odd to find myself typing almost the exact same words > years > later: > > "Before starting to write a new project I looked briefly at codespeed... The > dealbreaker is that codespeed is just a web application. It doesn't actually > (to > my knowledge, someone correct me if I'm wrong?) have any kind of a > framework for orchestrating the running of benchmarks throughout your code > history." [>] I totally agree with you. But the good part is that it doesn't need to have orchestration. TeamCitry or any other CI will do those steps for you. And the fact that you can run the benchmarks by hand and CI can just replicate your actions make suitable for most of the cases. And I don't care about codespeed or asv, as you said it is just a stupid web app. The most important part is to create a working pipeline. While we are looking for the best salt-cellar, we can use the plastic one. :) > > asv [3] is a more modern and evolved version of vbench. But it's Python- > specific. I think we need the same kind of thing except being able to automate > the execution of any benchmarks for any component in the Arrow project. So > we have some work to do. [>] Here is the catch - trying to do for any benchmarks will consume time and resources, and still there will be something left behind. It is hard to cover general case, and assume that the particular one, like C++ will be covered. > > - Wes > > [1]: > https://github.com/golang/tools/blob/master/cmd/benchcmp/benchcmp.go > [2]: http://wesmckinney.com/blog/introducing-vbench-new-code-performance- > analysis-and-monitoring-tool/ > [3]: https://github.com/airspeed-velocity/asv > > On Wed, Apr 24, 2019 at 11:18 AM Sebastien Binet <bi...@cern.ch> wrote: > > > > On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > > > > Hi Areg, > > > > > > Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit : > > > > Because we are using Google Benchmark, which has specific format > > > > there > > > is a tool called becnhcmp which compares two runs: > > > > > > > > $ benchcmp old.txt new.txt > > > > benchmark old ns/op new ns/op delta > > > > BenchmarkConcat 523 68.6 -86.88% > > > > > > > > So the comparison part is done and there is no need to create > > > > infra for > > > that. > > > > > > > "surprisingly" Go is already using that benchmark format :) and (on > > top of a Go-based benchcmp command) there is also a benchstat command > > that, given a set of multiple before/after data points adds some > > amount of statistical analysis: > > https://godoc.org/golang.org/x/perf/cmd/benchstat > > > > using the "benchmark" file format of benchcmp and benchstat would > > allow better cross-language interop. > > > > cheers, > > -s