HippoBaro opened a new pull request, #9846: URL: https://github.com/apache/arrow-rs/pull/9846
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Contributes to #9731 # Rationale for this change Existing benchmarks have some gaps in the types of columns they exercise. Additionally, I would like to improve the memory efficiency of the read/decode path in terms of RSS requirements, especially for sparse inputs and we currently do not have any infrastructure to measure that. # What changes are included in this PR? Extend the existing `arrow_reader` runtime benchmarks with `Int32` and `FixedBinary32` list columns alongside the existing `StringList`, with parameterized null density (0%, 50%, 90%, 99%). The prior benchmarks only covered string lists, which didn't surface costs specific to fixed-width and primitive element types. Add a new `arrow_reader_peak_memory` benchmark that measures peak heap usage during `ListArrayReader::consume_batch` using a thread-local tracking allocator. It captures how RSS-efficient we are when materializing a column into its final Arrow in-memory representation. # Are these changes tested? All tests passing. # Are there any user-facing changes? None. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
