[PR] bench(parquet): add `ListArray` benchmarks for runtime and peak memory [arrow-rs]

via GitHub Tue, 28 Apr 2026 15:48:52 -0700


HippoBaro opened a new pull request, #9846:
URL: https://github.com/apache/arrow-rs/pull/9846


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax.
   -->
   
   - Contributes to #9731
   
   # Rationale for this change
   
   Existing benchmarks have some gaps in the types of columns they exercise. 
Additionally, I would like to improve the memory efficiency of the read/decode 
path in terms of RSS requirements, especially for sparse inputs and we 
currently do not have any infrastructure to measure that.
   
   # What changes are included in this PR?
   
   Extend the existing `arrow_reader` runtime benchmarks with `Int32` and 
`FixedBinary32` list columns alongside the existing `StringList`, with 
parameterized null density (0%, 50%, 90%, 99%). The prior benchmarks only 
covered string lists, which didn't surface costs specific to fixed-width and 
primitive element types.
   
   Add a new `arrow_reader_peak_memory` benchmark that measures peak heap usage 
during `ListArrayReader::consume_batch` using a thread-local tracking 
allocator. It captures how RSS-efficient we are when materializing a column 
into its final Arrow in-memory representation.
   
   # Are these changes tested?
   
   All tests passing.
   
   # Are there any user-facing changes?
   
   None.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] bench(parquet): add `ListArray` benchmarks for runtime and peak memory [arrow-rs]

Reply via email to