[ 
https://issues.apache.org/jira/browse/ARROW-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan updated ARROW-6172:
----------------------------
    Summary: [Java] Provide benchmarks to set IntVector with different methods  
(was: [Java] Avoid creating value holders repeatedly when reading data from 
JDBC)

> [Java] Provide benchmarks to set IntVector with different methods
> -----------------------------------------------------------------
>
>                 Key: ARROW-6172
>                 URL: https://issues.apache.org/jira/browse/ARROW-6172
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When converting JDBC data to Arrow data. A value holder is created for each 
> single value. The following code snippet gives an example:
> NullableSmallIntHolder holder = new NullableSmallIntHolder();
>  holder.isSet = isNonNull ? 1 : 0;
>  if (isNonNull) {
>  holder.value = (short) value;
>  }
>  smallIntVector.setSafe(rowCount, holder);
>  smallIntVector.setValueCount(rowCount + 1);
>  
> This is inefficient, both in terms of memory usage, and computational 
> efficiency. 
> For most types, we can improve the performance by directly setting the value.
> For example, the benchmarks on IntVector show that a 20% performance 
> improvement can be achieved by directly setting the int value:
>  
> Benchmark Mode Cnt Score Error Units
> IntBenchmarks.setIntDirectly avgt 5 15.397 ± 0.018 us/op
> IntBenchmarks.setWithValueHolder avgt 5 19.198 ± 0.789 us/op
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to