etseidl commented on PR #8111:
URL: https://github.com/apache/arrow-rs/pull/8111#issuecomment-3191726394

   Latest run of the metadata bench. Still comparing the old thrift generated 
code to the new
   ```
   open(default)           time:   [35.073 µs 35.124 µs 35.179 µs]
   open(page index)        time:   [1.7446 ms 1.7524 ms 1.7639 ms]
   decode parquet metadata time:   [34.361 µs 34.505 µs 34.735 µs]
   decode thrift file metadata
                           time:   [22.239 µs 22.420 µs 22.644 µs]
   decode parquet metadata new
                           time:   [19.438 µs 19.486 µs 19.542 µs]
   decode parquet metadata (wide)
                           time:   [215.90 ms 216.69 ms 217.55 ms]
   decode thrift file metadata (wide)
                           time:   [109.33 ms 109.73 ms 110.14 ms]
   decode parquet metadata new (wide)
                           time:   [80.491 ms 80.788 ms 81.093 ms]
   page headers            time:   [8.6857 µs 8.7251 µs 8.7688 µs]
   ```
   No major changes since my last round 
(https://github.com/apache/arrow-rs/pull/8072#issuecomment-3166365107).
   
   The new code is some 40% faster for the smaller schema, and 63% faster for 
the wide (1000 columns) schema.
   
   On a herftier machine the speedup isn't quite as dramatic, but still around 
a 2X improvement.
   ```
   open(default)           time:   [14.366 µs 14.397 µs 14.432 µs]
   open(page index)        time:   [641.58 µs 643.12 µs 644.71 µs]
   decode parquet metadata time:   [13.787 µs 13.824 µs 13.862 µs]
   decode thrift file metadata
                           time:   [9.6280 µs 9.6709 µs 9.7208 µs]
   decode parquet metadata new
                           time:   [6.6013 µs 6.6182 µs 6.6364 µs]
   decode parquet metadata (wide)
                           time:   [59.198 ms 59.337 ms 59.494 ms]
   decode thrift file metadata (wide)
                           time:   [46.278 ms 46.377 ms 46.481 ms]
   decode parquet metadata new (wide)
                           time:   [30.504 ms 30.572 ms 30.646 ms]
   page headers            time:   [4.0325 µs 4.0426 µs 4.0539 µs]
   ```
   
   I'm going to merge this now and move on to the page indexes. I did a quick 
test last night and was able to get the 1.75ms for `open(page index)` down to 
850us. There's some inefficiency in the way we decode the page indexes, so I'm 
going to see if I can cut that time down some more. (Assuming the compiler 
isn't doing something clever and reusing some vec memory).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to