[
https://issues.apache.org/jira/browse/ARROW-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755513#comment-16755513
]
Areg Melik-Adamyan commented on ARROW-4313:
-------------------------------------------
Got it. I think that mostly those numbers are never used because you run
benchmarks on a fixed freq always to get consistent results in time. So they
can be easily determined from the model name or cpuid, just for informational
purposes, but will never be used in a serial benchmarking. In a serial
benchmarking everything should be fixed, nailed and unchanged, except the
variable you are measuring, and it is the arrow code measured through the
benchmark code.
> Define general benchmark database schema
> ----------------------------------------
>
> Key: ARROW-4313
> URL: https://issues.apache.org/jira/browse/ARROW-4313
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Benchmarking
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 0.13.0
>
> Attachments: benchmark-data-model.erdplus, benchmark-data-model.png
>
>
> Some possible attributes that the benchmark database should track, to permit
> heterogeneity of hardware and programming languages
> * Timestamp of benchmark run
> * Git commit hash of codebase
> * Machine unique name (sort of the "user id")
> * CPU identification for machine, and clock frequency (in case of
> overclocking)
> * CPU cache sizes (L1/L2/L3)
> * Whether or not CPU throttling is enabled (if it can be easily determined)
> * RAM size
> * GPU identification (if any)
> * Benchmark unique name
> * Programming language(s) associated with benchmark (e.g. a benchmark
> may involve both C++ and Python)
> * Benchmark time, plus mean and standard deviation if available, else NULL
> see discussion on mailing list
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)