[ 
https://issues.apache.org/jira/browse/IMPALA-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451600#comment-17451600
 ] 

Quanlong Huang commented on IMPALA-11037:
-----------------------------------------

I did an experiment on a query that scans 1B random integers. The flame graphs 
show that time spent in orc::RowReaderImpl::next() reduces a lot:
 * For a 32-bit random integer test case, it reduces from 1799 samples to 555 
samples.
 * For a 64-bit random integer test case, it reduces from 2884 samples to 554 
samples.

Perf-record samples in 99Hz so the number of samples reflects the elapsed time.

> Bump ORC to 1.7-p4 to contain the improvement of ORC-1020
> ---------------------------------------------------------
>
>                 Key: IMPALA-11037
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11037
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: orc_1.7.0-p3_random_int32.svg, 
> orc_1.7.0-p3_random_int64.svg, orc_1.7.0-p4_random_int32.svg, 
> orc_1.7.0-p4_random_int64.svg
>
>
> ORC-1020 improves read performance of the ORC library in scanning random 
> integers. Columns that encoded into integers, e.g. dictionary encoded 
> strings, will also benifit from this.
> This Jira aims to add ORC-1020 to our native-toolchain and bump our orc 
> version to 1.7-p4 to contain it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to