[ 
https://issues.apache.org/jira/browse/ARROW-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647653#comment-16647653
 ] 

Animesh Trivedi commented on ARROW-3496:
----------------------------------------

To give you an idea what I have so far 
([https://github.com/animeshtrivedi/benchmarking-arrow)|https://github.com/animeshtrivedi/benchmarking-arrow).]
 (its README is outdated). A standalone java program to : 

i) basic data generation template to generate data for integers, longs, binary 
column types (we can extend to include any arbitrary types and schema) 

ii) In-memory data buffers to hold the generated data in the memory (either on 
on or off heap buffers).  

iii) readers to consume the generated data using various APIs (calling get*(), 
or the holder API variant, or just writing your own readers from the direct 
byte buffers). 

The whole benchmark is multi-threaded and all 3 steps can be done in parallel. 
It is the last step usually what is benchmarked. Obviously the current code 
base has a whole lot more code for my own testing and understanding, but we can 
clean it up gradually. 

Where do we want to have this code? and how should a user run this? May be part 
of the default build process where benchmark is compiled as a separate jar 
(arrow-java-benchmarks-0.12.jar, something like this) 

> [Java] Add microbenchmark code to Java
> --------------------------------------
>
>                 Key: ARROW-3496
>                 URL: https://issues.apache.org/jira/browse/ARROW-3496
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>    Affects Versions: 0.11.0
>            Reporter: Li Jin
>            Priority: Major
>
> [~atrivedi] has done some microbenchmarking with the Java API. Let's consider 
> adding them to the codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to