Because we are using Google Benchmark, which has specific format there is a 
tool called becnhcmp which compares two runs:

$ benchcmp old.txt new.txt
benchmark           old ns/op     new ns/op     delta
BenchmarkConcat     523           68.6          -86.88%

So the comparison part is done and there is no need to create infra for that.

What we need is to change the ctest -L Benchmarks output to stdout to standard 
google benchmark output
--------------------------------------------------------------
Benchmark                        Time           CPU Iterations
--------------------------------------------------------------
BM_UserCounter/threads:1      9504 ns       9504 ns      73787
BM_UserCounter/threads:2      4775 ns       9550 ns      72606
BM_UserCounter/threads:4      2508 ns       9951 ns      70332
BM_UserCounter/threads:8      2055 ns       9933 ns      70344
BM_UserCounter/threads:16     1610 ns       9946 ns      70720
BM_UserCounter/threads:32     1192 ns       9948 ns      70496

The script on the build machine will parse this and alongside with the machine 
info send to DB.

The subset is done through passing --benchmark-filter=<...>
$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
Run on (1 X 2300 MHz CPU )
2016-06-25 19:34:24
Benchmark              Time           CPU Iterations
----------------------------------------------------
BM_memcpy/32          11 ns         11 ns   79545455
BM_memcpy/32k       2181 ns       2185 ns     324074
BM_memcpy/32          12 ns         12 ns   54687500
BM_memcpy/32k       1834 ns       1837 ns     357143

Or we can create buildbot mode and produce output in JSON format
{
  "context": {
    "date": "2019/03/17-18:40:25",
    "num_cpus": 40,
    "mhz_per_cpu": 2801,
    "cpu_scaling_enabled": false,
    "build_type": "debug"
  },
  "benchmarks": [
    {
      "name": "BM_SetInsert/1024/1",
      "iterations": 94877,
      "real_time": 29275,
      "cpu_time": 29836,
      "bytes_per_second": 134066,
      "items_per_second": 33516
    }
  ]
}

So we have all the ingredients and do not need to reinvent anything, we need 
just to agree on the process: what is done when and put to where in which 
format.


---------- Forwarded message ---------
From: Francois Saint-Jacques 
<fsaintjacq...@gmail.com<mailto:fsaintjacq...@gmail.com>>
Date: Tue, Apr 16, 2019 at 11:44 AM
Subject: Re: [Discuss] Benchmarking infrastructure
To: <dev@arrow.apache.org<mailto:dev@arrow.apache.org>>


Hello,

A small status update, I recently implemented archery [1] a tool for Arrow 
benchmarks comparison [2]. The documentation ([3] and [4]) is in the 
pull-request. The primary goal is to compare 2 commits (and/or build
directories) for performance regressions. For now, it supports C++ benchmarks.
This is accessible via the command `archery benchmark diff`. The end result is 
a one comparison per line, with an regression indicator.

Currently, there is no facility to perform a single "run", e.g. run benchmarks 
in the current workspace without comparing to a previous version. This was 
initially implemented in [5] but depended heavily on ctest (with no control on 
execution). Once [1] is merged, I'll re-implement single run (ARROW-5071) this 
in term of archery, since it already execute and parses C++ benchmarks.

The next goal is to be able to push the results into an upstream database, be 
it the one defined in dev/benchmarking, or codespeed as Areg proposed. The 
steps required for this:
- ARROW-5071: Run and format benchmark results for upstream consumption
  (ideally under the `archery benchmark run` sub-command)
- ARROW-5175: Make a list of benchmarks to include in regression checks
- ARROW-4716: Collect machine and benchmarks context
- ARROW-TBD: Push benchmark results to upstream database

In parallel, with ARROW-4827, Krisztian and I are working on 2 related buildbot 
sub-projects enabling some regression detection:
- Triggering on-demand benchmark comparison via comments in PR
   (as proposed by Wes)
- Regression check on master merge (without database support)

François

P.S.
A collateral of this PR is that archery is a modular python library and can be 
used for other purposes, e.g. it could centralize orphaned scripts in dev/, 
e.g. linting, release, and merge since it offers utilities to handle arrow 
sources, git, cmake and exposes a usable CLI interface (with documentation).

[1] https://github.com/apache/arrow/pull/4141
[2] https://jira.apache.org/jira/browse/ARROW-4827
[3]
https://github.com/apache/arrow/blob/512ae64bc074a0b620966131f9338d4a1eed2356/docs/source/developers/benchmarks.rst
[4]
https://github.com/apache/arrow/pull/4141/files#diff-7a8805436a6884ddf74fe3eaec697e71R216
[5] https://github.com/apache/arrow/pull/4077

On Fri, Mar 29, 2019 at 3:21 PM Melik-Adamyan, Areg < 
areg.melik-adam...@intel.com<mailto:areg.melik-adam...@intel.com>> wrote:

> >When you say "output is parsed", how is that exactly? We don't have
> >any
> scripts in the repository to do this yet (I have some comments on this
> below). We also have to collect machine information and insert that
> into the database. From my >perspective we have quite a bit of
> engineering work on this topic ("benchmark execution and data collection") to 
> do.
> Yes I wrote one as a test.  Then it can do POST to the needed endpoint
> the JSON structure. Everything else will be done in the
>
> >My team and I have some physical hardware (including an Aarch64
> >Jetson
> TX2 machine, might be interesting to see what the ARM64 results look
> like) where we'd like to run benchmarks and upload the results also,
> so we need to write some documentation about how to add a new machine
> and set up a cron job of some kind.
> If it can run Linux, then we can setup it.
>
> >I'd like to eventually have a bot that we can ask to run a benchmark
> comparison versus master. Reporting on all PRs automatically might be
> quite a bit of work (and load on the machines) You should be able to
> choose the comparison between any two points:
> master-PR, master now - master yesterday, etc.
>
> >I thought the idea (based on our past e-mail discussions) was that we
> would implement benchmark collectors (as programs in the Arrow git
> repository) for each benchmarking framework, starting with gbenchmark
> and expanding to include ASV (for Python) and then others I'll open a
> PR and happy to put it into Arrow.
>
> >It seems like writing the benchmark collector script that runs the
> benchmarks, collects machine information, and inserts data into an
> instance of the database is the next milestone. Until that's done it
> seems difficult to do much else Ok, will update the Jira 5070 and link
> the 5071.
>
> Thanks.
>

Reply via email to