[jira] [Updated] (HIVE-7219) Improve performance of serialization utils in ORC

Prasanth J (JIRA) Wed, 11 Jun 2014 14:52:34 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Prasanth J updated HIVE-7219:
-----------------------------

    Attachment: orc-read-perf-jmh-benchmark.png

Ran some benchmarks to see reader improvements. Used JMH to run benchmarks with 
10 warmup iterations and 10 benchmark iterations. Only the dataset that made 
use of bit packing were chosen for this benchmark.
Number of rows for datasets are
inventory_col2 and inventory_col4: 11745000
twitter_census_api_id: 24556361
twitter_search_id: 9396618
github_payload_size: 3216293
aol_querylog_epoch: 3558411
random.nexLong(): 10000000

> Improve performance of serialization utils in ORC
> -------------------------------------------------
>
>                 Key: HIVE-7219
>                 URL: https://issues.apache.org/jira/browse/HIVE-7219
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats
>    Affects Versions: 0.14.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>         Attachments: HIVE-7219.1.patch, orc-read-perf-jmh-benchmark.png
>
>
> ORC uses serialization utils heavily for reading and writing data. The 
> bitpacking and unpacking code in writeInts() and readInts() can be unrolled 
> for better performance. Also double reader/writer performance can be improved 
> by bulk reading/writing from/to byte array.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7219) Improve performance of serialization utils in ORC

Reply via email to