Le 20/02/2019 à 18:55, Melik-Adamyan, Areg a écrit : > 2. The unit-tests framework (google/benchmark) allows to effectively report > in textual format the needed data on benchmark with preamble containing > information about the machine on which the benchmarks are run.
On this topic, gbenchmark actually can output JSON, e.g.: ./build/release/arrow-utf8-util-benchmark --benchmark_out=results.json --benchmark_out_format=json Here is how the JSON output looks like: https://gist.github.com/pitrou/e055b454f333adf3c16325613c716309 Using this data it should be easy to massage an ingestion script that gives it to the database in the expected format. > - Disallow to enter data to the central repo any single benchmarks run, as > they do not mean much in the case of continuous and statistically relevant > measurements. [...] > - Mandate the contributors to have dedicated environment for measurements. I have no strong opinion on this. Another possibility is to regard one set of machines (e.g. Intel- or Ursa Labs-provided benchmarking machines, such as the DGX machines currently at Wes' office) as the reference for tracking regressions, and other machines as just informational. That said, I think you're right that it doesn't sound very useful to allow arbitrary benchmark result submissions. However, I think there could still be a separate test database instance, to allow easy testing of ingestion or reporting scripts. Regards Antoine. > 3. So with environments set and regular runs you have all the artifacts, > though not in a very comprehensible format. So the reason to set a dashboard > is to allow to consume data and be able to track performance of various parts > on a historical perspective and much more nicely with visualizations. > And here are the scope restrictions I have in mind: > - Disallow to enter data to the central repo any single benchmarks run, as > they do not mean much in the case of continuous and statistically relevant > measurements. What information you will get if someone reports some single > run? You do not know how clean it was done, and more importantly is it > possible to reproduce elsewhere. That is why even if it is better, worse or > the same you cannot compare with the data already in the DB. > - Mandate the contributors to have dedicated environment for measurements. > Otherwise they can use the TeamCity to run and parse data and publish on > their site. Data that enters Arrow performance DB becomes Arrow community > owned data. And it becomes community's job to answer why certain things are > better or worse. > - Because the numbers and flavors for CPU/GPU/accelerators are huge we > cannot satisfy all the needs upfront and create DB that covers all the > possible variants. I think we should have simple CPU and GPU configs now, > even if they will not be perfect. By simple I mean basic brand string. That > should be enough. Having all the detailed info in the DB does not make sense, > as my experience is telling, you never use them, you use the CPUID/brandname > to get the info needed. > - Scope and reqs will change during the time and going huge now will make > things complicated later. So I think it will be beneficial to have something > quick up and running, get better understanding of our needs and gaps, and go > from there. > The needed infra is already up on AWS, so as soon as we resolve DNS and key > exchange issues we can launch. > > -Areg. > > -----Original Message----- > From: Tanya Schlusser [mailto:ta...@tickel.net] > Sent: Thursday, February 7, 2019 4:40 PM > To: dev@arrow.apache.org > Subject: Re: Benchmarking dashboard proposal > > Late, but there's a PR now with first-draft DDL ( > https://github.com/apache/arrow/pull/3586). > Happy to receive any feedback! > > I tried to think about how people would submit benchmarks, and added a > Postgraphile container for http-via-GraphQL. > If others have strong opinions on the data modeling please speak up because > I'm more a database user than a designer. > > I can also help with benchmarking work in R/Python given guidance/a > roadmap/examples from someone else. > > Best, > Tanya > > On Mon, Feb 4, 2019 at 12:37 PM Tanya Schlusser <ta...@tickel.net> wrote: > >> I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL >> along with a README in a new directory `arrow/dev/benchmarking` unless >> directed otherwise. >> >> A "C++ Benchmark Collector" script would be super. I expect some >> back-and-forth on this to identify naïve assumptions in the data model. >> >> Attempting to submit actual benchmarks is how to get a handle on that. >> I recognize I'm blocking downstream work. Better to get an initial PR >> and some discussion going. >> >> Best, >> Tanya >> >> On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <wesmck...@gmail.com> wrote: >> >>> hi folks, >>> >>> I'm curious where we currently stand on this project. I see the >>> discussion in https://issues.apache.org/jira/browse/ARROW-4313 -- >>> would the next step be to have a pull request with .sql files >>> containing the DDL required to create the schema in PostgreSQL? >>> >>> I could volunteer to write the "C++ Benchmark Collector" script that >>> will run all the benchmarks on Linux and collect their data to be >>> inserted into the database. >>> >>> Thanks >>> Wes >>> >>> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net> >>> wrote: >>>> >>>> I don't want to be the bottleneck and have posted an initial draft >>>> data model in the JIRA issue >>> https://issues.apache.org/jira/browse/ARROW-4313 >>>> >>>> It should not be a problem to get content into a form that would be >>>> acceptable for either a static site like ASV (via CORS queries to a >>>> GraphQL/REST interface) or a codespeed-style site (via a separate >>>> schema organized for Django) >>>> >>>> I don't think I'm experienced enough to actually write any >>>> benchmarks though, so all I can contribute is backend work for this task. >>>> >>>> Best, >>>> Tanya >>>> >>>> On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <wesmck...@gmail.com> >>> wrote: >>>> >>>>> hi folks, >>>>> >>>>> I'd like to propose some kind of timeline for getting a first >>>>> iteration of a benchmark database developed and live, with >>>>> scripts to enable one or more initial agents to start adding new >>>>> data on a daily / per-commit basis. I have at least 3 physical >>>>> machines where I could immediately set up cron jobs to start >>>>> adding new data, and I could attempt to backfill data as far back as >>>>> possible. >>>>> >>>>> Personally, I would like to see this done by the end of February >>>>> if not sooner -- if we don't have the volunteers to push the work >>>>> to completion by then please let me know as I will rearrange my >>>>> priorities to make sure that it happens. Does that sounds reasonable? >>>>> >>>>> Please let me know if this plan sounds reasonable: >>>>> >>>>> * Set up a hosted PostgreSQL instance, configure backups >>>>> * Propose and adopt a database schema for storing benchmark >>>>> results >>>>> * For C++, write script (or Dockerfile) to execute all >>>>> google-benchmarks, output results to JSON, then adapter script >>>>> (Python) to ingest into database >>>>> * For Python, similar script that invokes ASV, then inserts ASV >>>>> results into benchmark database >>>>> >>>>> This seems to be a pre-requisite for having a front-end to >>>>> visualize the results, but the dashboard/front end can hopefully >>>>> be implemented in such a way that the details of the benchmark >>>>> database are not too tightly coupled >>>>> >>>>> (Do we have any other benchmarks in the project that would need >>>>> to be inserted initially?) >>>>> >>>>> Related work to trigger benchmarks on agents when new commits >>>>> land in master can happen concurrently -- one task need not block >>>>> the other >>>>> >>>>> Thanks >>>>> Wes >>>>> >>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney >>>>> <wesmck...@gmail.com> >>> wrote: >>>>>> >>>>>> Sorry, copy-paste failure: >>>>> https://issues.apache.org/jira/browse/ARROW-4313 >>>>>> >>>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney >>>>>> <wesmck...@gmail.com> >>>>> wrote: >>>>>>> >>>>>>> I don't think there is one but I just created >>>>>>> >>>>> >>> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c52 >>> 91a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E >>>>>>> >>>>>>> On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser < >>> ta...@tickel.net> >>>>> wrote: >>>>>>>> >>>>>>>> Areg, >>>>>>>> >>>>>>>> If you'd like help, I volunteer! No experience benchmarking >>>>>>>> but >>> tons >>>>>>>> experience databasing—I can mock the backend (database + >>>>>>>> http) >>> as a >>>>>>>> starting point for discussion if this is the way people >>>>>>>> want to >>> go. >>>>>>>> >>>>>>>> Is there a Jira ticket for this that i can jump into? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney < >>> wesmck...@gmail.com> >>>>> wrote: >>>>>>>> >>>>>>>>> hi Areg, >>>>>>>>> >>>>>>>>> This sounds great -- we've discussed building a more >>> full-featured >>>>>>>>> benchmark automation system in the past but nothing has >>>>>>>>> been >>>>> developed >>>>>>>>> yet. >>>>>>>>> >>>>>>>>> Your proposal about the details sounds OK; the single >>>>>>>>> most >>>>> important >>>>>>>>> thing to me is that we build and maintain a very general >>> purpose >>>>>>>>> database schema for building the historical benchmark >>>>>>>>> database >>>>>>>>> >>>>>>>>> The benchmark database should keep track of: >>>>>>>>> >>>>>>>>> * Timestamp of benchmark run >>>>>>>>> * Git commit hash of codebase >>>>>>>>> * Machine unique name (sort of the "user id") >>>>>>>>> * CPU identification for machine, and clock frequency (in >>> case of >>>>>>>>> overclocking) >>>>>>>>> * CPU cache sizes (L1/L2/L3) >>>>>>>>> * Whether or not CPU throttling is enabled (if it can be >>> easily >>>>> determined) >>>>>>>>> * RAM size >>>>>>>>> * GPU identification (if any) >>>>>>>>> * Benchmark unique name >>>>>>>>> * Programming language(s) associated with benchmark (e.g. >>>>>>>>> a >>>>> benchmark >>>>>>>>> may involve both C++ and Python) >>>>>>>>> * Benchmark time, plus mean and standard deviation if >>> available, >>>>> else NULL >>>>>>>>> >>>>>>>>> (maybe some other things) >>>>>>>>> >>>>>>>>> I would rather not be locked into the internal database >>> schema of a >>>>>>>>> particular benchmarking tool. So people in the community >>>>>>>>> can >>> just >>>>> run >>>>>>>>> SQL queries against the database and use the data however >>>>>>>>> they >>>>> like. >>>>>>>>> We'll just have to be careful that people don't DROP >>>>>>>>> TABLE or >>>>> DELETE >>>>>>>>> (but we should have daily backups so we can recover from >>>>>>>>> such >>>>> cases) >>>>>>>>> >>>>>>>>> So while we may make use of TeamCity to schedule the runs >>>>>>>>> on >>> the >>>>> cloud >>>>>>>>> and physical hardware, we should also provide a path for >>>>>>>>> other >>>>> people >>>>>>>>> in the community to add data to the benchmark database on >>> their >>>>>>>>> hardware on an ad hoc basis. For example, I have several >>> machines >>>>> in >>>>>>>>> my home on all operating systems (Windows / macOS / >>>>>>>>> Linux, >>> and soon >>>>>>>>> also ARM64) and I'd like to set up scheduled tasks / cron >>> jobs to >>>>>>>>> report in to the database at least on a daily basis. >>>>>>>>> >>>>>>>>> Ideally the benchmark database would just be a PostgreSQL >>> server >>>>> with >>>>>>>>> a schema we write down and keep backed up etc. Hosted >>> PostgreSQL is >>>>>>>>> inexpensive ($200+ per year depending on size of >>>>>>>>> instance; >>> this >>>>>>>>> probably doesn't need to be a crazy big machine) >>>>>>>>> >>>>>>>>> I suspect there will be a manageable amount of >>>>>>>>> development >>>>> involved to >>>>>>>>> glue each of the benchmarking frameworks together with >>>>>>>>> the >>>>> benchmark >>>>>>>>> database. This can also handle querying the operating >>>>>>>>> system >>> for >>>>> the >>>>>>>>> system information listed above >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Wes >>>>>>>>> >>>>>>>>> On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg >>>>>>>>> <areg.melik-adam...@intel.com> wrote: >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I want to restart/attach to the discussions for >>>>>>>>>> creating >>> Arrow >>>>>>>>> benchmarking dashboard. I want to propose performance >>> benchmark >>>>> run per >>>>>>>>> commit to track the changes. >>>>>>>>>> The proposal includes building infrastructure for >>>>>>>>>> per-commit >>>>> tracking >>>>>>>>> comprising of the following parts: >>>>>>>>>> - Hosted JetBrains for OSS >>>>>>>>>> https://teamcity.jetbrains.com/ >>> as a >>>>> build >>>>>>>>> system >>>>>>>>>> - Agents running in cloud both VM/container >>>>>>>>>> (DigitalOcean, >>> or >>>>> others) >>>>>>>>> and bare-metal (Packet.net/AWS) and on-premise(Nvidia >>>>>>>>> boxes?) >>>>>>>>>> - JFrog artifactory storage and management for OSS >>>>>>>>>> projects >>>>>>>>> https://jfrog.com/open-source/#artifactory2 >>>>>>>>>> - Codespeed as a frontend >>> https://github.com/tobami/codespeed >>>>>>>>>> >>>>>>>>>> I am volunteering to build such system (if needed more >>>>>>>>>> Intel >>>>> folks will >>>>>>>>> be involved) so we can start tracking performance on >>>>>>>>> various >>>>> platforms and >>>>>>>>> understand how changes affect it. >>>>>>>>>> >>>>>>>>>> Please, let me know your thoughts! >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -Areg. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>> >>> >>