ding-young commented on PR #7917:
URL: https://github.com/apache/arrow-rs/pull/7917#issuecomment-3077375154
- cargo bench result
| Case (str_len, null prob) | main | issue-6057 |
|---------------------------|--------------|--------------|
| string view(10, 0) | 51.23 µs | 52.18 µs |
| string view(30, 0) | 45.47 µs | 46.63 µs |
| string view(100, 0) | 64.18 µs | 68.54 µs |
| string view(100, 0.5) | 70.11 µs | 74.06 µs |
| string view(1..100, 0) | 100.72 µs | 103.80 µs |
| string view(1..100, 0.5) | 80.48 µs | 86.02 µs |
- manual memory profiling result (*unit = B)
I added code to get jemalloc stats (allocate, resident, active) before and
after decoding binary view, and the memory usage actually improved especially
when short strings are mixed up with large strings. When given rows consists of
only large strings, the memory usage was the same.
```rust
let before = jemalloc_stat();
let view = if !validate_utf8 {
decode_binary_view_inner_utf8_unchecked(rows, options)
} else {
decode_binary_view_inner(rows, options, validate_utf8)
};
let after = jemalloc_stat();
// print ( after - before )
```
(To reproduce, see
https://github.com/ding-young/arrow-rs/tree/issue-6057-bench-mem )
| Case | main (alloc / active) | issue-6057 (alloc /
active) |
|---------------------------|----------------------|-----------------------------|
| string view(10, 0) | **102656 / 114688** | **65536 / 69632**
|
| string view(30, 0) | 196608 / 204800 | 196608 / 204800
|
| string view(100, 0) | 524288 / 532480 | 524288 / 532480
|
| string view(100, 0.5) | 294912 / 303104 | 294912 / 303104
|
| string view(1..100, 0) | 294912 / 303104 | 294912 / 303104
|
| string view(1..100, 0.5) | **180224 / 188416** | **163840 / 172032**
|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]