Quanlong Huang created ORC-1150:
-----------------------------------
Summary: Improve RowReaderImpl::computeBatchSize()
Key: ORC-1150
URL: https://issues.apache.org/jira/browse/ORC-1150
Project: ORC
Issue Type: Improvement
Components: C++
Reporter: Quanlong Huang
Attachments: RowReaderImpl_next_annotation.png,
image-2022-04-12-17-11-28-091.png
RowReaderImpl::computeBatchSize() can be the hot path when sargs exists. The
following perf report shows that orc::RowReaderImpl::next() itself takes 1/4 of
the scan time. It's measured using orc-scan with sargs "inv_quantity_on_hand
between -1 and 5000" scanning 4 orc files of TPCDS-inventory table (768.23MB in
total size). !image-2022-04-12-17-11-28-091.png|width=713,height=251!
Looking into the disassembly of it, the time is taken by a loop:
!RowReaderImpl_next_annotation.png|width=556,height=465!
The annotation indicates it's the inlined RowReaderImpl::computeBatchSize()
method. Disassembly codes:
{code:java}
│ d0:┌─→mov %r14,%r15
0.36 │ │ mov %esi,%ecx
0.13 │ │ shr $0x6,%rdx
22.81 │ │ shl %cl,%r15
24.24 │ │ test %r15,(%r9,%rdx,8)
│ │↓ je fb
│ e2:│ lea 0x1(%rsi),%edx
0.22 │ │ mov %r10,%rax
0.18 │ │ imul %rdx,%rax
25.31 │ │ mov %rdx,%rsi
│ │ cmp %rdi,%rax
0.54 │ │ cmova %rdi,%rax
0.04 │ ├──cmp %r11,%rdx
23.79 │ └──jb d0
0.31 │ fb: sub %r8,%rax{code}
The corresponding loop:
{code:cpp}
endRowInStripe = currentRowInStripe;
uint32_t rg = static_cast<uint32_t>(currentRowInStripe / rowIndexStride);
for (; rg < includedRowGroups.size(); ++rg) {
if (!includedRowGroups[rg]) {
break;
} else {
endRowInStripe = std::min(rowsInCurrentStripe, (rg + 1) * rowIndexStride);
}
} {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)