[ 
https://issues.apache.org/jira/browse/ARROW-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269706#comment-16269706
 ] 

ASF GitHub Bot commented on ARROW-1844:
---------------------------------------

wesm opened a new pull request #1370: ARROW-1844: [C++] Add initial Unique 
benchmarks for int64, variable-length strings
URL: https://github.com/apache/arrow/pull/1370
 
 
   I also fixed a bug this surfaced in the hash table resize (unit test 
coverage was not adequate)
   
   Now we have
   
   ```
   $ ./release/compute-benchmark 
   Run on (8 X 4174.84 MHz CPU s)
   2017-11-28 18:18:26
   Benchmark                                                           Time     
      CPU Iterations
   
-------------------------------------------------------------------------------------------------
   BM_BuildDictionary/min_time:1.000                                1451 us     
  1451 us        959   2.68974GB/s
   BM_BuildStringDictionary/min_time:1.000                          4005 us     
  4005 us        350   75.3785MB/s
   BM_UniqueInt64NoNulls/16M/50/min_time:1.000/real_time           35940 us     
 35942 us         39   91.3192MB/s
   BM_UniqueInt64NoNulls/16M/1024/min_time:1.000/real_time        120002 us     
120006 us         12   88.8877MB/s
   BM_UniqueInt64NoNulls/16M/10k/min_time:1.000/real_time         175855 us     
175862 us          8   90.9838MB/s
   BM_UniqueInt64NoNulls/16M/1024k/min_time:1.000/real_time       452242 us     
452257 us          3   94.3449MB/s
   BM_UniqueInt64WithNulls/16M/50/min_time:1.000/real_time         58632 us     
 58634 us         29   75.2797MB/s
   BM_UniqueInt64WithNulls/16M/1024/min_time:1.000/real_time      134079 us     
134084 us         10   95.4661MB/s
   BM_UniqueInt64WithNulls/16M/10k/min_time:1.000/real_time       183846 us     
183851 us          8   87.0295MB/s
   BM_UniqueInt64WithNulls/16M/1024k/min_time:1.000/real_time     528790 us     
528808 us          3   80.6873MB/s
   BM_UniqueString10bytes/16M/50/min_time:1.000/real_time         152207 us     
152212 us          9     116.8MB/s
   BM_UniqueString10bytes/16M/1024/min_time:1.000/real_time       260047 us     
260056 us          5   123.055MB/s
   BM_UniqueString10bytes/16M/10k/min_time:1.000/real_time        426539 us     
426552 us          3   125.038MB/s
   BM_UniqueString10bytes/16M/1024k/min_time:1.000/real_time     1716739 us    
1716791 us          1      93.2MB/s
   BM_UniqueString100bytes/16M/50/min_time:1.000/real_time        556145 us     
556165 us          3   958.982MB/s
   BM_UniqueString100bytes/16M/1024/min_time:1.000/real_time      693922 us     
693943 us          2   1.12585GB/s
   BM_UniqueString100bytes/16M/10k/min_time:1.000/real_time      1000449 us    
1000484 us          1    1.5618GB/s
   BM_UniqueString100bytes/16M/1024k/min_time:1.000/real_time    3591215 us    
3591314 us          1   445.532MB/s
   ```
   
   This suggests quite a lot of room for improvement -- it's counter-intuitive 
to me that hashing strings seems optically faster than hashing integers, so we 
should figure out what's going on there.
   
   We can also refactor the hash table implementations without worrying too 
much about whether we're making things slower

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [C++] Basic benchmark suite for hash kernels
> --------------------------------------------
>
>                 Key: ARROW-1844
>                 URL: https://issues.apache.org/jira/browse/ARROW-1844
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> * Integers, small cardinality and large cardinality
> * Short strings, small/large cardinality
> * Long strings, small/large cardinality
> These benchmarks will enable us to refactor without fear, and to experiment 
> with faster hash functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to