tustvold commented on code in PR #6590:
URL: https://github.com/apache/arrow-rs/pull/6590#discussion_r1807256219
##########
arrow-buffer/src/bytes.rs:
##########
@@ -44,6 +44,9 @@ pub struct Bytes {
/// how to deallocate this region
deallocation: Deallocation,
+
+ #[cfg(feature = "pool")]
+ reservation: std::sync::Mutex<Option<Box<dyn crate::MemoryReservation>>>,
Review Comment:
I originally used `parking_lot::Mutex` but this would have made Buffer not
RewindSafe, which would be a breaking change (and given there is a test for
this I presume this is important).
##########
arrow-buffer/Cargo.toml:
##########
@@ -28,11 +28,17 @@ include = { workspace = true }
edition = { workspace = true }
rust-version = { workspace = true }
+[package.metadata.docs.rs]
+features = ["pool"]
+
[lib]
name = "arrow_buffer"
path = "src/lib.rs"
bench = false
+[features]
+pool = []
Review Comment:
As it is very hard to guage what if any performance implications there are
of this, I think it is important that it is gated by a feature flag at least
initially.
##########
arrow-buffer/src/bytes.rs:
##########
@@ -96,6 +101,12 @@ impl Bytes {
}
}
+ /// Register this [`Bytes`] with the provided [`MemoryPool`]
+ #[cfg(feature = "pool")]
+ pub fn claim(&self, pool: &dyn crate::MemoryPool) {
+ *self.reservation.lock().unwrap() =
Some(pool.register(self.capacity()));
Review Comment:
There are three implications of this formulation:
* We always allocate a new memory reservation which could be wasteful
* We allocate from the new pool before freeing the memory from the previous
reservation
##########
arrow-buffer/src/pool.rs:
##########
@@ -0,0 +1,79 @@
+use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::Arc;
+
+/// A [`MemoryPool`] can be used to track memory usage by
[`Buffer`](crate::Buffer)
+pub trait MemoryPool {
+ fn register(&self, size: usize) -> Box<dyn MemoryReservation>;
+}
+
+/// A memory reservation within a [`MemoryPool`] that is freed on drop
+pub trait MemoryReservation {
+ fn resize(&mut self, new: usize);
Review Comment:
I don't actually have a use-case for this atm, but this seemed sensible to
add now to avoid it being difficult later
##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -357,6 +357,12 @@ impl Buffer {
pub fn ptr_eq(&self, other: &Self) -> bool {
self.ptr == other.ptr && self.length == other.length
}
+
+ /// Register this [`Buffer`] with the provided [`MemoryPool`]
+ #[cfg(feature = "pool")]
+ pub fn claim(&self, pool: &dyn crate::MemoryPool) {
Review Comment:
We could plumb this claim API through the various arrays and RecordBatch.
Consumers like DF could then claim `RecordBatch` they wish to buffer, and know
they aren't double-counting.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]