[
https://issues.apache.org/jira/browse/ORC-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved ORC-1150.
---------------------------------
Fix Version/s: 1.8.0
Resolution: Fixed
Resolved by https://github.com/apache/orc/pull/1087
> [C++] Improve RowReaderImpl::computeBatchSize()
> -----------------------------------------------
>
> Key: ORC-1150
> URL: https://issues.apache.org/jira/browse/ORC-1150
> Project: ORC
> Issue Type: Improvement
> Components: C++
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Fix For: 1.8.0
>
> Attachments: RowReaderImpl_next_annotation.png,
> image-2022-04-12-17-11-28-091.png
>
>
> RowReaderImpl::computeBatchSize() can be the hot path when sargs exists. The
> following perf report shows that orc::RowReaderImpl::next() itself takes 1/4
> of the scan time. It's measured using orc-scan with sargs
> "inv_quantity_on_hand between -1 and 5000" scanning 4 orc files of
> TPCDS-inventory table (768.23MB in total size).
> !image-2022-04-12-17-11-28-091.png|width=713,height=251!
> Looking into the disassembly of it, the time is taken by a loop:
> !RowReaderImpl_next_annotation.png|width=556,height=465!
> The annotation indicates it's the inlined RowReaderImpl::computeBatchSize()
> method. Disassembly codes:
> {code:java}
> │ d0:┌─→mov %r14,%r15
> 0.36 │ │ mov %esi,%ecx
> 0.13 │ │ shr $0x6,%rdx
> 22.81 │ │ shl %cl,%r15
> 24.24 │ │ test %r15,(%r9,%rdx,8)
> │ │↓ je fb
> │ e2:│ lea 0x1(%rsi),%edx
> 0.22 │ │ mov %r10,%rax
> 0.18 │ │ imul %rdx,%rax
> 25.31 │ │ mov %rdx,%rsi
> │ │ cmp %rdi,%rax
> 0.54 │ │ cmova %rdi,%rax
> 0.04 │ ├──cmp %r11,%rdx
> 23.79 │ └──jb d0
> 0.31 │ fb: sub %r8,%rax{code}
> The corresponding loop:
> {code:cpp}
> endRowInStripe = currentRowInStripe;
> uint32_t rg = static_cast<uint32_t>(currentRowInStripe / rowIndexStride);
> for (; rg < includedRowGroups.size(); ++rg) {
> if (!includedRowGroups[rg]) {
> break;
> } else {
> endRowInStripe = std::min(rowsInCurrentStripe, (rg + 1) * rowIndexStride);
> }
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)