Quanlong Huang created ORC-1020:
-----------------------------------

             Summary: Improve orc::RleDecoderV2::nextDirect
                 Key: ORC-1020
                 URL: https://issues.apache.org/jira/browse/ORC-1020
             Project: ORC
          Issue Type: Improvement
          Components: C++
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang
         Attachments: orc-scan-release-lineitem-random-bigint-snappy.svg

This is found by [~drorke] that orc::RleDecoderV2::nextDirect takes the 
majority of the time when scanning bigint columns. I reproduce the issue by 
using the orc-scan tool to read the random bigint columns of a TPCH lineitem 
table. In the attached frame graph, 91.89% of the time is spent in 
orc::RleDecoderV2::nextDirect. Only a small portion of it is used in snappy 
decompression.

We should consider unrolling the loop in orc::RleDecoderV2::readLongs. There is 
already a TODO: 
[https://github.com/apache/orc/blob/93af6b076c210b0c3b77e5af3d6fbef1bd1150a1/c%2B%2B/src/RLEv2.hh#L186]

[~csringhofer] also points out that we can borrow some ideas done in Impala for 
bit unpacking: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/util/bit-packing.inline.h#L60]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to