Sorry, copy-paste failure: https://issues.apache.org/jira/browse/ARROW-4313
On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <wesmck...@gmail.com> wrote: > > I don't think there is one but I just created > https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net> wrote: > > > > Areg, > > > > If you'd like help, I volunteer! No experience benchmarking but tons > > experience databasing—I can mock the backend (database + http) as a > > starting point for discussion if this is the way people want to go. > > > > Is there a Jira ticket for this that i can jump into? > > > > > > > > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > hi Areg, > > > > > > This sounds great -- we've discussed building a more full-featured > > > benchmark automation system in the past but nothing has been developed > > > yet. > > > > > > Your proposal about the details sounds OK; the single most important > > > thing to me is that we build and maintain a very general purpose > > > database schema for building the historical benchmark database > > > > > > The benchmark database should keep track of: > > > > > > * Timestamp of benchmark run > > > * Git commit hash of codebase > > > * Machine unique name (sort of the "user id") > > > * CPU identification for machine, and clock frequency (in case of > > > overclocking) > > > * CPU cache sizes (L1/L2/L3) > > > * Whether or not CPU throttling is enabled (if it can be easily > > > determined) > > > * RAM size > > > * GPU identification (if any) > > > * Benchmark unique name > > > * Programming language(s) associated with benchmark (e.g. a benchmark > > > may involve both C++ and Python) > > > * Benchmark time, plus mean and standard deviation if available, else NULL > > > > > > (maybe some other things) > > > > > > I would rather not be locked into the internal database schema of a > > > particular benchmarking tool. So people in the community can just run > > > SQL queries against the database and use the data however they like. > > > We'll just have to be careful that people don't DROP TABLE or DELETE > > > (but we should have daily backups so we can recover from such cases) > > > > > > So while we may make use of TeamCity to schedule the runs on the cloud > > > and physical hardware, we should also provide a path for other people > > > in the community to add data to the benchmark database on their > > > hardware on an ad hoc basis. For example, I have several machines in > > > my home on all operating systems (Windows / macOS / Linux, and soon > > > also ARM64) and I'd like to set up scheduled tasks / cron jobs to > > > report in to the database at least on a daily basis. > > > > > > Ideally the benchmark database would just be a PostgreSQL server with > > > a schema we write down and keep backed up etc. Hosted PostgreSQL is > > > inexpensive ($200+ per year depending on size of instance; this > > > probably doesn't need to be a crazy big machine) > > > > > > I suspect there will be a manageable amount of development involved to > > > glue each of the benchmarking frameworks together with the benchmark > > > database. This can also handle querying the operating system for the > > > system information listed above > > > > > > Thanks > > > Wes > > > > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg > > > <areg.melik-adam...@intel.com> wrote: > > > > > > > > Hello, > > > > > > > > I want to restart/attach to the discussions for creating Arrow > > > benchmarking dashboard. I want to propose performance benchmark run per > > > commit to track the changes. > > > > The proposal includes building infrastructure for per-commit tracking > > > comprising of the following parts: > > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build > > > system > > > > - Agents running in cloud both VM/container (DigitalOcean, or others) > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?) > > > > - JFrog artifactory storage and management for OSS projects > > > https://jfrog.com/open-source/#artifactory2 > > > > - Codespeed as a frontend https://github.com/tobami/codespeed > > > > > > > > I am volunteering to build such system (if needed more Intel folks will > > > be involved) so we can start tracking performance on various platforms and > > > understand how changes affect it. > > > > > > > > Please, let me know your thoughts! > > > > > > > > Thanks, > > > > -Areg. > > > > > > > > > > > > > > >