[
https://issues.apache.org/jira/browse/IMPALA-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451600#comment-17451600
]
Quanlong Huang commented on IMPALA-11037:
-----------------------------------------
I did an experiment on a query that scans 1B random integers. The flame graphs
show that time spent in orc::RowReaderImpl::next() reduces a lot:
* For a 32-bit random integer test case, it reduces from 1799 samples to 555
samples.
* For a 64-bit random integer test case, it reduces from 2884 samples to 554
samples.
Perf-record samples in 99Hz so the number of samples reflects the elapsed time.
> Bump ORC to 1.7-p4 to contain the improvement of ORC-1020
> ---------------------------------------------------------
>
> Key: IMPALA-11037
> URL: https://issues.apache.org/jira/browse/IMPALA-11037
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Attachments: orc_1.7.0-p3_random_int32.svg,
> orc_1.7.0-p3_random_int64.svg, orc_1.7.0-p4_random_int32.svg,
> orc_1.7.0-p4_random_int64.svg
>
>
> ORC-1020 improves read performance of the ORC library in scanning random
> integers. Columns that encoded into integers, e.g. dictionary encoded
> strings, will also benifit from this.
> This Jira aims to add ORC-1020 to our native-toolchain and bump our orc
> version to 1.7-p4 to contain it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]