RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Melik-Adamyan, Areg Wed, 24 Apr 2019 18:24:12 -0700

Wes,

The process as I think should be the following.
1. Commit triggers to build in TeamCity. I have set the TeamCity, but we can 
use whatever CI we would like.
2. TeamCity is using the pool of identical machines to run the predefined (or 
all) performance benchmarks on one the build machines from the pool.
3. Each benchmark generates output - by using Google Benchmarks we generate 
JSON format file.
4. The build step in the TeamCity which runs the performance gathers all those 
files and parses them.
5. For each parsed output it creates an entry in the DB with the commit ID as a 
key and auxiliary information that can be helpful.
6. The codespeed sitting on top of that Database visualize data in the 
dashboard by marking regressions as red and progressions as green compared to 
either baseline which you define or previous commit, as all the commits are 
ordered in the time.
7. You can create custom queries to compare specific commits or see trends on 
the timeline.


I am not mandating codespeed or anything else, but we should start with 
something. We can use something more sophisticated, like Influx.
 
> In the benchmarking one of the hardest parts (IMHO) is the process/workflow
> automation. I'm in support of the development of a "meta-benchmarking"
> framework that offers automation, extensibility, and possibility for
> customization.
[>] Meta is good, and I am totally supporting it, but meanwhile we are doing 
that there is a need for something very simple but usable.
> 
> One of the reasons that people don't do more benchmarking as part of their
> development process is that the tooling around it isn't great.
> Using a command line tool [1] that outputs unconfigurable text to the terminal
> to compare benchmarks seems inadequate to me.
[>] I would argue here - it is the minimal config that works with external 
tooling without creating huge infrastructure around it. We already use Google 
Benchmark library which provides all the needed output format. And if you do 
not like CodeSpeed we can use anything else, e.g. Dana 
(https://github.com/google/dana) from Google. 
> 
> In the cited example
> 
> $ benchcmp old.txt new.txt
> 
> Where do old.txt and new.txt come from? I would like to have that detail 
> (build
> of appropriate component, execution of benchmarks and collection of results)
> automated.
[>]In the case of Go it is: $go test -run=^$ -bench=. ./... > old.txt
Then you switch to the new branch and do the same with >new.txt then you do 
benchcmp and it does the comparison. 3 bash commands.

> 
> FWIW, 7 and a half years ago [2] I wrote a small project called vbench to 
> assist
> with benchmark automation, so this has been a long-term interest of mine.
> Codespeed existed in 2011, here is what I wrote about it in December 2011,
> and it is definitely odd to find myself typing almost the exact same words 
> years
> later:
> 
> "Before starting to write a new project I looked briefly at codespeed... The
> dealbreaker is that codespeed is just a web application. It doesn't actually 
> (to
> my knowledge, someone correct me if I'm wrong?) have any kind of a
> framework for orchestrating the running of benchmarks throughout your code
> history."
[>] I totally agree with you. But the good part is that it doesn't need to have 
orchestration. TeamCitry or any other CI will do those steps for you. And the 
fact that you can run the benchmarks by hand and CI can just replicate your 
actions make suitable for most of the cases. And I don't care about codespeed 
or asv, as you said it is just a stupid web app. The most important part is to 
create a working pipeline. While we are looking for the best salt-cellar, we 
can use the plastic one. :)
> 
> asv [3] is a more modern and evolved version of vbench. But it's Python-
> specific. I think we need the same kind of thing except being able to automate
> the execution of any benchmarks for any component in the Arrow project. So
> we have some work to do.
[>] Here is the catch - trying to do for any benchmarks will consume time and 
resources, and still there will be something left behind. It is hard to cover 
general case, and assume that the particular one, like C++ will be covered. 

> 
> - Wes
> 
> [1]:
> https://github.com/golang/tools/blob/master/cmd/benchcmp/benchcmp.go
> [2]: http://wesmckinney.com/blog/introducing-vbench-new-code-performance-
> analysis-and-monitoring-tool/
> [3]: https://github.com/airspeed-velocity/asv
> 
> On Wed, Apr 24, 2019 at 11:18 AM Sebastien Binet <[email protected]> wrote:
> >
> > On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou <[email protected]>
> wrote:
> >
> > >
> > > Hi Areg,
> > >
> > > Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit :
> > > > Because we are using Google Benchmark, which has specific format
> > > > there
> > > is a tool called becnhcmp which compares two runs:
> > > >
> > > > $ benchcmp old.txt new.txt
> > > > benchmark           old ns/op     new ns/op     delta
> > > > BenchmarkConcat     523           68.6          -86.88%
> > > >
> > > > So the comparison part is done and there is no need to create
> > > > infra for
> > > that.
> > >
> >
> > "surprisingly" Go is already using that benchmark format :) and (on
> > top of a Go-based benchcmp command) there is also a benchstat command
> > that, given a set of multiple before/after data points adds some
> > amount of statistical analysis:
> >  https://godoc.org/golang.org/x/perf/cmd/benchstat
> >
> > using the "benchmark" file format of benchcmp and benchstat would
> > allow better cross-language interop.
> >
> > cheers,
> > -s

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Reply via email to