[ 
https://issues.apache.org/jira/browse/ARROW-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772241#comment-16772241
 ] 

Areg Melik-Adamyan commented on ARROW-4313:
-------------------------------------------

[~wesmckinn] and [~pitrou] - need your input.

> My understanding of [this conversation] 
> (https://lists.apache.org/thread.html/dcc08ab10507a5139178d7f816c0f5177ff0657546a4ade3ed71ffd5@%3Cdev.arrow.apache.org%3E)
>  was that a data model not tied to any ORM tool was the desired path to take.
> 
I think we need to take a step back, and sync and agree with the @wesm and 
@pitrou on the goals for this little project: 
* for me the goal is to continuously track the performance for the core C++ 
library and help everybody who is doing performance work to catch regressions 
and contribute improvements. 
* do that in a validated form, so we can rely on the numbers.
* there is no goal to provide infrastructure for contributing 3rd party 
numbers, as they cannot be validated in a quick manner.
* there is no goal to bench other languages, as they rely on C++ library calls 
and you will benchmark the wrapper conversion speed
* there is no goal, for now, to anticipate and satisfy all the future possible 
needs.

The ability of the Arrow test library (practically GTest) to provide 
performance numbers on a run platform is more than enough. I would not like to 
limit users to have a different kind of databases, performance monitors or 
dashboards of their need. I am duplicating this in the issue 4313 to move the 
discussion from code review.

> Define general benchmark database schema
> ----------------------------------------
>
>                 Key: ARROW-4313
>                 URL: https://issues.apache.org/jira/browse/ARROW-4313
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Benchmarking
>            Reporter: Wes McKinney
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>         Attachments: benchmark-data-model.erdplus, benchmark-data-model.png
>
>          Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Some possible attributes that the benchmark database should track, to permit 
> heterogeneity of hardware and programming languages
> * Timestamp of benchmark run
> * Git commit hash of codebase
> * Machine unique name (sort of the "user id")
> * CPU identification for machine, and clock frequency (in case of 
> overclocking)
> * CPU cache sizes (L1/L2/L3)
> * Whether or not CPU throttling is enabled (if it can be easily determined)
> * RAM size
> * GPU identification (if any)
> * Benchmark unique name
> * Programming language(s) associated with benchmark (e.g. a benchmark
> may involve both C++ and Python)
> * Benchmark time, plus mean and standard deviation if available, else NULL
> see discussion on mailing list 
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to