[ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prasanth J updated HIVE-7219: ----------------------------- Attachment: orc-read-perf-jmh-benchmark.png Ran some benchmarks to see reader improvements. Used JMH to run benchmarks with 10 warmup iterations and 10 benchmark iterations. Only the dataset that made use of bit packing were chosen for this benchmark. Number of rows for datasets are inventory_col2 and inventory_col4: 11745000 twitter_census_api_id: 24556361 twitter_search_id: 9396618 github_payload_size: 3216293 aol_querylog_epoch: 3558411 random.nexLong(): 10000000 > Improve performance of serialization utils in ORC > ------------------------------------------------- > > Key: HIVE-7219 > URL: https://issues.apache.org/jira/browse/HIVE-7219 > Project: Hive > Issue Type: Improvement > Components: File Formats > Affects Versions: 0.14.0 > Reporter: Prasanth J > Assignee: Prasanth J > Attachments: HIVE-7219.1.patch, orc-read-perf-jmh-benchmark.png > > > ORC uses serialization utils heavily for reading and writing data. The > bitpacking and unpacking code in writeInts() and readInts() can be unrolled > for better performance. Also double reader/writer performance can be improved > by bulk reading/writing from/to byte array. -- This message was sent by Atlassian JIRA (v6.2#6252)