Areg,

If you'd like help, I volunteer! No experience benchmarking but tons
experience databasing—I can mock the backend (database + http) as a
starting point for discussion if this is the way people want to go.

Is there a Jira ticket for this that i can jump into?




On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <wesmck...@gmail.com> wrote:

> hi Areg,
>
> This sounds great -- we've discussed building a more full-featured
> benchmark automation system in the past but nothing has been developed
> yet.
>
> Your proposal about the details sounds OK; the single most important
> thing to me is that we build and maintain a very general purpose
> database schema for building the historical benchmark database
>
> The benchmark database should keep track of:
>
> * Timestamp of benchmark run
> * Git commit hash of codebase
> * Machine unique name (sort of the "user id")
> * CPU identification for machine, and clock frequency (in case of
> overclocking)
> * CPU cache sizes (L1/L2/L3)
> * Whether or not CPU throttling is enabled (if it can be easily determined)
> * RAM size
> * GPU identification (if any)
> * Benchmark unique name
> * Programming language(s) associated with benchmark (e.g. a benchmark
> may involve both C++ and Python)
> * Benchmark time, plus mean and standard deviation if available, else NULL
>
> (maybe some other things)
>
> I would rather not be locked into the internal database schema of a
> particular benchmarking tool. So people in the community can just run
> SQL queries against the database and use the data however they like.
> We'll just have to be careful that people don't DROP TABLE or DELETE
> (but we should have daily backups so we can recover from such cases)
>
> So while we may make use of TeamCity to schedule the runs on the cloud
> and physical hardware, we should also provide a path for other people
> in the community to add data to the benchmark database on their
> hardware on an ad hoc basis. For example, I have several machines in
> my home on all operating systems (Windows / macOS / Linux, and soon
> also ARM64) and I'd like to set up scheduled tasks / cron jobs to
> report in to the database at least on a daily basis.
>
> Ideally the benchmark database would just be a PostgreSQL server with
> a schema we write down and keep backed up etc. Hosted PostgreSQL is
> inexpensive ($200+ per year depending on size of instance; this
> probably doesn't need to be a crazy big machine)
>
> I suspect there will be a manageable amount of development involved to
> glue each of the benchmarking frameworks together with the benchmark
> database. This can also handle querying the operating system for the
> system information listed above
>
> Thanks
> Wes
>
> On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> <areg.melik-adam...@intel.com> wrote:
> >
> > Hello,
> >
> > I want to restart/attach to the discussions for creating Arrow
> benchmarking dashboard. I want to propose performance benchmark run per
> commit to track the changes.
> > The proposal includes building infrastructure for per-commit tracking
> comprising of the following parts:
> > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> system
> > - Agents running in cloud both VM/container (DigitalOcean, or others)
> and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > - JFrog artifactory storage and management for OSS projects
> https://jfrog.com/open-source/#artifactory2
> > - Codespeed as a frontend https://github.com/tobami/codespeed
> >
> > I am volunteering to build such system (if needed more Intel folks will
> be involved) so we can start tracking performance on various platforms and
> understand how changes affect it.
> >
> > Please, let me know your thoughts!
> >
> > Thanks,
> > -Areg.
> >
> >
> >
>

Reply via email to