etseidl commented on PR #8111: URL: https://github.com/apache/arrow-rs/pull/8111#issuecomment-3191726394
Latest run of the metadata bench. Still comparing the old thrift generated code to the new ``` open(default) time: [35.073 µs 35.124 µs 35.179 µs] open(page index) time: [1.7446 ms 1.7524 ms 1.7639 ms] decode parquet metadata time: [34.361 µs 34.505 µs 34.735 µs] decode thrift file metadata time: [22.239 µs 22.420 µs 22.644 µs] decode parquet metadata new time: [19.438 µs 19.486 µs 19.542 µs] decode parquet metadata (wide) time: [215.90 ms 216.69 ms 217.55 ms] decode thrift file metadata (wide) time: [109.33 ms 109.73 ms 110.14 ms] decode parquet metadata new (wide) time: [80.491 ms 80.788 ms 81.093 ms] page headers time: [8.6857 µs 8.7251 µs 8.7688 µs] ``` No major changes since my last round (https://github.com/apache/arrow-rs/pull/8072#issuecomment-3166365107). The new code is some 40% faster for the smaller schema, and 63% faster for the wide (1000 columns) schema. On a herftier machine the speedup isn't quite as dramatic, but still around a 2X improvement. ``` open(default) time: [14.366 µs 14.397 µs 14.432 µs] open(page index) time: [641.58 µs 643.12 µs 644.71 µs] decode parquet metadata time: [13.787 µs 13.824 µs 13.862 µs] decode thrift file metadata time: [9.6280 µs 9.6709 µs 9.7208 µs] decode parquet metadata new time: [6.6013 µs 6.6182 µs 6.6364 µs] decode parquet metadata (wide) time: [59.198 ms 59.337 ms 59.494 ms] decode thrift file metadata (wide) time: [46.278 ms 46.377 ms 46.481 ms] decode parquet metadata new (wide) time: [30.504 ms 30.572 ms 30.646 ms] page headers time: [4.0325 µs 4.0426 µs 4.0539 µs] ``` I'm going to merge this now and move on to the page indexes. I did a quick test last night and was able to get the 1.75ms for `open(page index)` down to 850us. There's some inefficiency in the way we decode the page indexes, so I'm going to see if I can cut that time down some more. (Assuming the compiler isn't doing something clever and reusing some vec memory). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org