[PR] Restructure compare_greater function used in parquet statistics for better performance [arrow-rs]

via GitHub Sat, 12 Jul 2025 04:14:45 -0700


jhorstmann opened a new pull request, #7916:
URL: https://github.com/apache/arrow-rs/pull/7916


   # Which issue does this PR close?
   
   Another small optimization to parquet writing, followup to #7822 (I can 
create a separate issue if needed).
   
   # Rationale for this change
   
   Improves the performance in the microbenchmark for writing primitive types 
by around 6%:
   
   ```
   write_batch primitive/4096 values primitive
                           time:   [437.72 µs 439.91 µs 442.40 µs]
                           thrpt:  [397.68 MiB/s 399.93 MiB/s 401.93 MiB/s]
                    change:
                           time:   [-6.7582% -6.2865% -5.7391%] (p = 0.00 < 
0.05)
                           thrpt:  [+6.0885% +6.7082% +7.2480%]
                           Performance has improved.
   write_batch primitive/4096 values primitive non-null
                           time:   [358.86 µs 359.39 µs 359.98 µs]
                           thrpt:  [479.24 MiB/s 480.03 MiB/s 480.74 MiB/s]
                    change:
                           time:   [-6.7127% -6.4322% -6.1675%] (p = 0.00 < 
0.05)
                           thrpt:  [+6.5729% +6.8744% +7.1957%]
                           Performance has improved.
   ```
   
   # What changes are included in this PR?
   
   This restructures the code in `compare_greater` to check the generic type 
parameter first, and also for all special cases. The main difference, and what 
seems to enable llvm to generate better code, is probably that the `as_u64` is 
only called for types where the implementation is actually infallible.
   
   I looked into also specializing the `get_min_max` function by moving the 
logical type checks outside of the loop, but that did not bring any further 
measurable improvement.
   
   # Are these changes tested?
   
   Should already be covered by existing unit tests.
   
   # Are there any user-facing changes?
   
   No, as far as I'm aware, the logical types for unsigned integers should only 
ever be used for the INT32 and INT64 physical types. The previous code would 
have failed at runtime in `as_u64` if that would not be the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] Restructure compare_greater function used in parquet statistics for better performance [arrow-rs]

Reply via email to