Because we are using Google Benchmark, which has specific format there is a tool called becnhcmp which compares two runs:
$ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkConcat 523 68.6 -86.88% So the comparison part is done and there is no need to create infra for that. What we need is to change the ctest -L Benchmarks output to stdout to standard google benchmark output -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- BM_UserCounter/threads:1 9504 ns 9504 ns 73787 BM_UserCounter/threads:2 4775 ns 9550 ns 72606 BM_UserCounter/threads:4 2508 ns 9951 ns 70332 BM_UserCounter/threads:8 2055 ns 9933 ns 70344 BM_UserCounter/threads:16 1610 ns 9946 ns 70720 BM_UserCounter/threads:32 1192 ns 9948 ns 70496 The script on the build machine will parse this and alongside with the machine info send to DB. The subset is done through passing --benchmark-filter=<...> $ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 Run on (1 X 2300 MHz CPU ) 2016-06-25 19:34:24 Benchmark Time CPU Iterations ---------------------------------------------------- BM_memcpy/32 11 ns 11 ns 79545455 BM_memcpy/32k 2181 ns 2185 ns 324074 BM_memcpy/32 12 ns 12 ns 54687500 BM_memcpy/32k 1834 ns 1837 ns 357143 Or we can create buildbot mode and produce output in JSON format { "context": { "date": "2019/03/17-18:40:25", "num_cpus": 40, "mhz_per_cpu": 2801, "cpu_scaling_enabled": false, "build_type": "debug" }, "benchmarks": [ { "name": "BM_SetInsert/1024/1", "iterations": 94877, "real_time": 29275, "cpu_time": 29836, "bytes_per_second": 134066, "items_per_second": 33516 } ] } So we have all the ingredients and do not need to reinvent anything, we need just to agree on the process: what is done when and put to where in which format. ---------- Forwarded message --------- From: Francois Saint-Jacques <fsaintjacq...@gmail.com<mailto:fsaintjacq...@gmail.com>> Date: Tue, Apr 16, 2019 at 11:44 AM Subject: Re: [Discuss] Benchmarking infrastructure To: <dev@arrow.apache.org<mailto:dev@arrow.apache.org>> Hello, A small status update, I recently implemented archery [1] a tool for Arrow benchmarks comparison [2]. The documentation ([3] and [4]) is in the pull-request. The primary goal is to compare 2 commits (and/or build directories) for performance regressions. For now, it supports C++ benchmarks. This is accessible via the command `archery benchmark diff`. The end result is a one comparison per line, with an regression indicator. Currently, there is no facility to perform a single "run", e.g. run benchmarks in the current workspace without comparing to a previous version. This was initially implemented in [5] but depended heavily on ctest (with no control on execution). Once [1] is merged, I'll re-implement single run (ARROW-5071) this in term of archery, since it already execute and parses C++ benchmarks. The next goal is to be able to push the results into an upstream database, be it the one defined in dev/benchmarking, or codespeed as Areg proposed. The steps required for this: - ARROW-5071: Run and format benchmark results for upstream consumption (ideally under the `archery benchmark run` sub-command) - ARROW-5175: Make a list of benchmarks to include in regression checks - ARROW-4716: Collect machine and benchmarks context - ARROW-TBD: Push benchmark results to upstream database In parallel, with ARROW-4827, Krisztian and I are working on 2 related buildbot sub-projects enabling some regression detection: - Triggering on-demand benchmark comparison via comments in PR (as proposed by Wes) - Regression check on master merge (without database support) François P.S. A collateral of this PR is that archery is a modular python library and can be used for other purposes, e.g. it could centralize orphaned scripts in dev/, e.g. linting, release, and merge since it offers utilities to handle arrow sources, git, cmake and exposes a usable CLI interface (with documentation). [1] https://github.com/apache/arrow/pull/4141 [2] https://jira.apache.org/jira/browse/ARROW-4827 [3] https://github.com/apache/arrow/blob/512ae64bc074a0b620966131f9338d4a1eed2356/docs/source/developers/benchmarks.rst [4] https://github.com/apache/arrow/pull/4141/files#diff-7a8805436a6884ddf74fe3eaec697e71R216 [5] https://github.com/apache/arrow/pull/4077 On Fri, Mar 29, 2019 at 3:21 PM Melik-Adamyan, Areg < areg.melik-adam...@intel.com<mailto:areg.melik-adam...@intel.com>> wrote: > >When you say "output is parsed", how is that exactly? We don't have > >any > scripts in the repository to do this yet (I have some comments on this > below). We also have to collect machine information and insert that > into the database. From my >perspective we have quite a bit of > engineering work on this topic ("benchmark execution and data collection") to > do. > Yes I wrote one as a test. Then it can do POST to the needed endpoint > the JSON structure. Everything else will be done in the > > >My team and I have some physical hardware (including an Aarch64 > >Jetson > TX2 machine, might be interesting to see what the ARM64 results look > like) where we'd like to run benchmarks and upload the results also, > so we need to write some documentation about how to add a new machine > and set up a cron job of some kind. > If it can run Linux, then we can setup it. > > >I'd like to eventually have a bot that we can ask to run a benchmark > comparison versus master. Reporting on all PRs automatically might be > quite a bit of work (and load on the machines) You should be able to > choose the comparison between any two points: > master-PR, master now - master yesterday, etc. > > >I thought the idea (based on our past e-mail discussions) was that we > would implement benchmark collectors (as programs in the Arrow git > repository) for each benchmarking framework, starting with gbenchmark > and expanding to include ASV (for Python) and then others I'll open a > PR and happy to put it into Arrow. > > >It seems like writing the benchmark collector script that runs the > benchmarks, collects machine information, and inserts data into an > instance of the database is the next milestone. Until that's done it > seems difficult to do much else Ok, will update the Jira 5070 and link > the 5071. > > Thanks. >