etseidl commented on PR #8111:
URL: https://github.com/apache/arrow-rs/pull/8111#issuecomment-3191726394
Latest run of the metadata bench. Still comparing the old thrift generated
code to the new
```
open(default) time: [35.073 µs 35.124 µs 35.179 µs]
open(page index) time: [1.7446 ms 1.7524 ms 1.7639 ms]
decode parquet metadata time: [34.361 µs 34.505 µs 34.735 µs]
decode thrift file metadata
time: [22.239 µs 22.420 µs 22.644 µs]
decode parquet metadata new
time: [19.438 µs 19.486 µs 19.542 µs]
decode parquet metadata (wide)
time: [215.90 ms 216.69 ms 217.55 ms]
decode thrift file metadata (wide)
time: [109.33 ms 109.73 ms 110.14 ms]
decode parquet metadata new (wide)
time: [80.491 ms 80.788 ms 81.093 ms]
page headers time: [8.6857 µs 8.7251 µs 8.7688 µs]
```
No major changes since my last round
(https://github.com/apache/arrow-rs/pull/8072#issuecomment-3166365107).
The new code is some 40% faster for the smaller schema, and 63% faster for
the wide (1000 columns) schema.
On a herftier machine the speedup isn't quite as dramatic, but still around
a 2X improvement.
```
open(default) time: [14.366 µs 14.397 µs 14.432 µs]
open(page index) time: [641.58 µs 643.12 µs 644.71 µs]
decode parquet metadata time: [13.787 µs 13.824 µs 13.862 µs]
decode thrift file metadata
time: [9.6280 µs 9.6709 µs 9.7208 µs]
decode parquet metadata new
time: [6.6013 µs 6.6182 µs 6.6364 µs]
decode parquet metadata (wide)
time: [59.198 ms 59.337 ms 59.494 ms]
decode thrift file metadata (wide)
time: [46.278 ms 46.377 ms 46.481 ms]
decode parquet metadata new (wide)
time: [30.504 ms 30.572 ms 30.646 ms]
page headers time: [4.0325 µs 4.0426 µs 4.0539 µs]
```
I'm going to merge this now and move on to the page indexes. I did a quick
test last night and was able to get the 1.75ms for `open(page index)` down to
850us. There's some inefficiency in the way we decode the page indexes, so I'm
going to see if I can cut that time down some more. (Assuming the compiler
isn't doing something clever and reusing some vec memory).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]