Areg, If you'd like help, I volunteer! No experience benchmarking but tons experience databasing—I can mock the backend (database + http) as a starting point for discussion if this is the way people want to go.
Is there a Jira ticket for this that i can jump into? On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <wesmck...@gmail.com> wrote: > hi Areg, > > This sounds great -- we've discussed building a more full-featured > benchmark automation system in the past but nothing has been developed > yet. > > Your proposal about the details sounds OK; the single most important > thing to me is that we build and maintain a very general purpose > database schema for building the historical benchmark database > > The benchmark database should keep track of: > > * Timestamp of benchmark run > * Git commit hash of codebase > * Machine unique name (sort of the "user id") > * CPU identification for machine, and clock frequency (in case of > overclocking) > * CPU cache sizes (L1/L2/L3) > * Whether or not CPU throttling is enabled (if it can be easily determined) > * RAM size > * GPU identification (if any) > * Benchmark unique name > * Programming language(s) associated with benchmark (e.g. a benchmark > may involve both C++ and Python) > * Benchmark time, plus mean and standard deviation if available, else NULL > > (maybe some other things) > > I would rather not be locked into the internal database schema of a > particular benchmarking tool. So people in the community can just run > SQL queries against the database and use the data however they like. > We'll just have to be careful that people don't DROP TABLE or DELETE > (but we should have daily backups so we can recover from such cases) > > So while we may make use of TeamCity to schedule the runs on the cloud > and physical hardware, we should also provide a path for other people > in the community to add data to the benchmark database on their > hardware on an ad hoc basis. For example, I have several machines in > my home on all operating systems (Windows / macOS / Linux, and soon > also ARM64) and I'd like to set up scheduled tasks / cron jobs to > report in to the database at least on a daily basis. > > Ideally the benchmark database would just be a PostgreSQL server with > a schema we write down and keep backed up etc. Hosted PostgreSQL is > inexpensive ($200+ per year depending on size of instance; this > probably doesn't need to be a crazy big machine) > > I suspect there will be a manageable amount of development involved to > glue each of the benchmarking frameworks together with the benchmark > database. This can also handle querying the operating system for the > system information listed above > > Thanks > Wes > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg > <areg.melik-adam...@intel.com> wrote: > > > > Hello, > > > > I want to restart/attach to the discussions for creating Arrow > benchmarking dashboard. I want to propose performance benchmark run per > commit to track the changes. > > The proposal includes building infrastructure for per-commit tracking > comprising of the following parts: > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build > system > > - Agents running in cloud both VM/container (DigitalOcean, or others) > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?) > > - JFrog artifactory storage and management for OSS projects > https://jfrog.com/open-source/#artifactory2 > > - Codespeed as a frontend https://github.com/tobami/codespeed > > > > I am volunteering to build such system (if needed more Intel folks will > be involved) so we can start tracking performance on various platforms and > understand how changes affect it. > > > > Please, let me know your thoughts! > > > > Thanks, > > -Areg. > > > > > > >