HippoBaro commented on code in PR #9697:
URL: https://github.com/apache/arrow-rs/pull/9697#discussion_r3083361668
##########
parquet/src/util/push_buffers.rs:
##########
@@ -82,83 +100,166 @@ impl PushBuffers {
file_len,
ranges: Vec::new(),
buffers: Vec::new(),
+ #[cfg(feature = "arrow")]
+ watermark: 0,
+ sorted: true,
}
}
- /// Push all the ranges and buffers
+ /// Restore the sort invariant on `ranges`/`buffers`.
+ ///
+ /// Because IO completions are expected to generally arrive in-order,
+ /// `push_range` appends without sorting. We instead delay sorting until
+ /// conumption to amortize its cost, if necessary.
+ ///
+ /// This method must be called before any read-side operation that relies
on
+ /// binary search (`has_range`, `get_bytes`, `release_through`,
+ /// `Read::read`). Callers that hold `&mut PushBuffers` should call this
+ /// once before lending `&PushBuffers` to read-side code.
+ pub fn ensure_sorted(&mut self) {
+ if self.sorted {
+ return;
+ }
+
+ // Insertion sort: zero-allocation and linear on nearly-sorted input
Review Comment:
Yes, this was fine under my mistaken assumption that files were always laid
out such that reading them back would be sequential. In that case, most reads
would happen at the edge, and inserting buffers here would mostly amount to
inserting into an ordered, or nearly ordered, vec.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]