[
https://issues.apache.org/jira/browse/ARROW-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269728#comment-16269728
]
ASF GitHub Bot commented on ARROW-1844:
---------------------------------------
wesm commented on issue #1370: ARROW-1844: [C++] Add initial Unique benchmarks
for int64, variable-length strings
URL: https://github.com/apache/arrow/pull/1370#issuecomment-347702136
Using 0.5 for the load factor threshold for resizing makes things a lot
faster for smaller cardinality tables, with minimal impact on large cardinality
tables:
```
$ ./release/compute-benchmark
Run on (8 X 4185.31 MHz CPU s)
2017-11-28 18:38:38
Benchmark Time
CPU Iterations
-------------------------------------------------------------------------------------------------
BM_BuildDictionary/min_time:1.000 1328 us
1328 us 1083 2.93863GB/s
BM_BuildStringDictionary/min_time:1.000 3143 us
3143 us 446 96.0677MB/s
BM_UniqueInt64NoNulls/16M/50/min_time:1.000/real_time 35761 us
35762 us 39 3.49545GB/s
BM_UniqueInt64NoNulls/16M/1024/min_time:1.000/real_time 69412 us
69414 us 20 1.80085GB/s
BM_UniqueInt64NoNulls/16M/10k/min_time:1.000/real_time 97227 us
97231 us 14 1.28565GB/s
BM_UniqueInt64NoNulls/16M/1024k/min_time:1.000/real_time 457800 us
457806 us 3 279.598MB/s
BM_UniqueInt64WithNulls/16M/50/min_time:1.000/real_time 48785 us
48786 us 29 2.56228GB/s
BM_UniqueInt64WithNulls/16M/1024/min_time:1.000/real_time 82978 us
82981 us 17 1.50642GB/s
BM_UniqueInt64WithNulls/16M/10k/min_time:1.000/real_time 111961 us
111965 us 13 1.11646GB/s
BM_UniqueInt64WithNulls/16M/1024k/min_time:1.000/real_time 531226 us
531244 us 3 240.952MB/s
BM_UniqueString10bytes/16M/50/min_time:1.000/real_time 150719 us
150724 us 9 1061.58MB/s
BM_UniqueString10bytes/16M/1024/min_time:1.000/real_time 193408 us
193413 us 7 827.266MB/s
BM_UniqueString10bytes/16M/10k/min_time:1.000/real_time 278841 us
278851 us 5 573.803MB/s
BM_UniqueString10bytes/16M/1024k/min_time:1.000/real_time 1700923 us
1700978 us 1 94.0666MB/s
BM_UniqueString100bytes/16M/50/min_time:1.000/real_time 563184 us
563204 us 2 2.7744GB/s
BM_UniqueString100bytes/16M/1024/min_time:1.000/real_time 636416 us
636436 us 2 2.45515GB/s
BM_UniqueString100bytes/16M/10k/min_time:1.000/real_time 861909 us
861941 us 2 1.81284GB/s
BM_UniqueString100bytes/16M/1024k/min_time:1.000/real_time 3651238 us
3651359 us 1 438.208MB/s
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [C++] Basic benchmark suite for hash kernels
> --------------------------------------------
>
> Key: ARROW-1844
> URL: https://issues.apache.org/jira/browse/ARROW-1844
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> * Integers, small cardinality and large cardinality
> * Short strings, small/large cardinality
> * Long strings, small/large cardinality
> These benchmarks will enable us to refactor without fear, and to experiment
> with faster hash functions
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)