[
https://issues.apache.org/jira/browse/ORC-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432878#comment-17432878
]
Quanlong Huang commented on ORC-1020:
-------------------------------------
Uploaded a [PR|https://github.com/apache/orc/pull/944] with some initial perf
results of scanning 1 billion random unsigned numbers:
||bitSize||fileSize||old utime(s)||opt utime(s)||speedup||old
instructions||optimized instructions||rate||
|4|524M|5.93|3.25|1.824615385|85,211,309,815|50,340,977,903|1.692682847|
|8|971M|6.24|2.62|2.381679389|97,019,508,601|43,314,985,276|2.239860131|
|16|1.9G|7.24|2.59|2.795366795|116,173,358,035|44,638,334,445|2.60254688|
|24|2.9G|8.28|3.04|2.723684211|135,108,191,144|53,598,230,651|2.520758419|
|32|3.8G|9.45|2.9|3.25862069|154,212,921,271|48,846,038,538|3.157122377|
|40|4.7G|10.62|3.42|3.105263158|173,383,793,909|62,105,810,766|2.791748337|
|48|5.7G|11.67|3.56|3.278089888|192,483,545,716|63,364,191,632|3.037733786|
|56|6.6G|12.86|4.21|3.054631829|211,621,537,386|69,625,640,158|3.039419629|
|64|7.5G|14.05|7.76|1.81056701|230,752,541,375|53,358,474,439|4.324571566|
More details can be found in the sheet:
[https://docs.google.com/spreadsheets/d/12ApcTxzJLQtfdPJuTZqZUJvbnUYDbSX1XYtcEJAdIcg/edit?usp=sharing]
Note that we use unsigned numbers to avoid the overhead on unzigzaging them.
> Improve orc::RleDecoderV2::nextDirect
> -------------------------------------
>
> Key: ORC-1020
> URL: https://issues.apache.org/jira/browse/ORC-1020
> Project: ORC
> Issue Type: Improvement
> Components: C++
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Attachments: orc-scan-release-lineitem-random-bigint-snappy.svg
>
>
> This is found by [~drorke] that orc::RleDecoderV2::nextDirect takes the
> majority of the time when scanning bigint columns. I reproduce the issue by
> using the orc-scan tool to read the random bigint columns of a TPCH lineitem
> table. In the attached frame graph, 91.89% of the time is spent in
> orc::RleDecoderV2::nextDirect. Only a small portion of it is used in snappy
> decompression.
> Note that orc::RleDecoderV2::nextDirect is also used in other column types,
> e.g. dictionary encoded string columns. So improving it can boost performance
> in many scenarios.
> We should consider unrolling the loop in orc::RleDecoderV2::readLongs. There
> is already a TODO:
> [https://github.com/apache/orc/blob/93af6b076c210b0c3b77e5af3d6fbef1bd1150a1/c%2B%2B/src/RLEv2.hh#L186]
> [~csringhofer] also points out that we can borrow some ideas done in Impala
> for bit unpacking:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/util/bit-packing.inline.h#L60]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)