Currently, there are 3 snowflakes :) - Benchmark setup: https://github.com/TomAugspurger/asv-runner + Some setup to bootstrap a clean install with airflow, conda, asv, supervisor, etc. All the infrastructure around running the benchmarks. + Each project adds itself to the list of benchmarks, as in https://github.com/TomAugspurger/asv-runner/pull/3. Then things are re-deployed. Deployment requires ansible and an SSH key for the benchmark machine - Benchmark publishing: After running all the benchmarks, the results are collected and pushed to https://github.com/tomaugspurger/asv-collection - Benchmark hosting: A cron job on the server hosting pandas docs pulls https://github.com/tomaugspurger/asv-collection and serves them from the `/speed` directory.
There are many things that could be improved on here, but I personally won't have time in the near term. Happy to assist though. On Mon, Apr 23, 2018 at 10:15 AM, Wes McKinney <wesmck...@gmail.com> wrote: > hi Tom -- is the publishing workflow for this documented someplace, or > available in a GitHub repo? We want to make sure we don't accumulate > any "snowflakes" in the development process. > > thanks! > Wes > > On Fri, Apr 13, 2018 at 8:36 AM, Tom Augspurger > <tom.augspurge...@gmail.com> wrote: > > They are run daily and published to http://pandas.pydata.org/speed/ > > > > > > ________________________________ > > From: Antoine Pitrou <anto...@python.org> > > Sent: Friday, April 13, 2018 4:28:11 AM > > To: dev@arrow.apache.org > > Subject: Re: Continuous benchmarking setup > > > > > > Nice! Are the benchmark results published somewhere? > > > > > > > > Le 13/04/2018 à 02:50, Tom Augspurger a écrit : > >> https://github.com/TomAugspurger/asv-runner/ is the setup for the > projects currently running. Adding arrow to https://github.com/ > TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have > to redeploy with the update. > >> > >> ________________________________ > >> From: Wes McKinney <wesmck...@gmail.com> > >> Sent: Thursday, April 12, 2018 7:24:20 PM > >> To: dev@arrow.apache.org > >> Subject: Re: Continuous benchmarking setup > >> > >> hi Antoine, > >> > >> I have a bare metal machine at home (affectionately known as the > >> "pandabox") that's available via SSH that we've been using for > >> continuous benchmarking for other projects. Arrow is welcome to use > >> it. I can give you access to the machine if you would like. Hopefully, > >> we can suitably the process of setting up a continuous benchmarking > >> machine so that if we need to migrate to a new machine, it is not too > >> much of a hardship to do so. > >> > >> Thanks > >> Wes > >> > >> On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <anto...@python.org> > wrote: > >>> > >>> Hello > >>> > >>> With the following changes, it seems we might reach the point where > >>> we're able to run the Python-based benchmark suite accross multiple > >>> commits (at least the ones not anterior to those changes): > >>> https://github.com/apache/arrow/pull/1775 > >>> > >>> To make this truly useful, we would need a dedicated host. Ideally a > >>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled. > >>> If running virtualized, the VM should have dedicated physical CPU > cores. > >>> > >>> That machine would run the benchmarks on a regular basis (perhaps once > >>> per night) and publish the results in static HTML form somewhere. > >>> > >>> (note: nice to have in the future might be access to NVidia hardware, > >>> but right now there are no CUDA benchmarks in the Python benchmarks) > >>> > >>> What should be the procedure here? > >>> > >>> Regards > >>> > >>> Antoine. > >> >