lyang24 commented on PR #9093:
URL: https://github.com/apache/arrow-rs/pull/9093#issuecomment-3737337607
its looks like its doing well with large scan querys
```
arrow_reader_clickbench/async/Q20 1.18 130.2±1.59ms ? ?/sec
1.00 110.7±0.82ms ? ?/sec
arrow_reader_clickbench/async/Q21 1.29 165.5±0.99ms ? ?/sec
1.00 128.6±0.93ms ? ?/sec
arrow_reader_clickbench/async/Q22 1.24 318.7±11.80ms ? ?/sec
1.00 257.3±6.26ms ? ?/sec
```
some regressions with high selectivity
```
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async
1.00 2.6±0.02ms ? ?/sec 1.31 3.5±0.08ms ?
?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync
1.00 2.4±0.02ms ? ?/sec 1.35 3.2±0.03ms ?
?/sec
```
regression with
```
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded,
mandatory, no NULLs 1.00 75.9±0.46µs ? ?/sec
1.56 118.1±0.43µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded,
optional, half NULLs 1.00 232.7±2.34µs ? ?/sec
1.23 285.9±3.00µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded,
optional, no NULLs 1.00 80.8±0.46µs ? ?/sec
1.53 123.7±0.34µs ? ?/sec
```
had chatgpt summarize the result as
<h1 data-start="88" data-end="126">Summary: Pre-allocation vs Main Path</h1>
<p data-start="128" data-end="280">Your <code data-start="133"
data-end="160">ViewBuffer::with_capacity</code> optimization has a <strong
data-start="180" data-end="228">very consistent, architecture-level
behavior</strong> across all tested Parquet → Arrow decoding kernels.</p>
<hr data-start="282" data-end="285">
<h2 data-start="287" data-end="306">🟢 Where It Wins</h2>
<div class="TyagGW_tableContainer"><div tabindex="-1" class="group
TyagGW_tableWrapper flex flex-col-reverse w-fit">
Kernel Type | Why | Speedup
-- | -- | --
BinaryView / StringView | Avoids repeated realloc of pointer & offset
vectors | +5% → +25%
Dictionary encoded | Index + value indirection benefits from fixed capacity
| +6% → +18%
ByteStreamSplit numeric | Chunked layout breaks streaming writes | +5% → +15%
Selective row filters (<40% survive) | Output small & unpredictable | +3% →
+12%
</div></div>
<p data-start="1230" data-end="1343">These kernels are <strong
data-start="1248" data-end="1284">flat POD, memory-bandwidth bound</strong> —
eager zero-touching destroys cache & streaming behavior.</p>
<hr data-start="1345" data-end="1348">
<h2 data-start="1350" data-end="1382"></h2>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]