jordepic commented on code in PR #22862:
URL: https://github.com/apache/datafusion/pull/22862#discussion_r3385296517


##########
datafusion/common/src/utils/memory.rs:
##########
@@ -133,12 +133,29 @@ pub fn estimate_memory_size<T>(num_elements: usize, 
fixed_size: usize) -> Result
 pub fn get_record_batch_memory_size(batch: &RecordBatch) -> usize {
     // Store pointers to `Buffer`'s start memory address (instead of actual
     // used data region's pointer represented by current `Array`)
-    let mut counted_buffers: HashSet<NonNull<u8>> = HashSet::new();
+    let mut counted_buffers: HashSet<usize> = HashSet::new();
+    count_record_batch_memory_size(batch, &mut counted_buffers)
+}
+
+/// Calculate the memory used by `batch`, counting only `Buffer`s whose start
+/// addresses are not already present in `counted_buffers`, and record the
+/// batch's buffer addresses in the set.
+///
+/// Use this instead of [`get_record_batch_memory_size`] to account for the 
total
+/// memory of a sequence of batches that may share underlying buffers, e.g.
+/// zero-copy slices of one larger batch. Calling
+/// [`get_record_batch_memory_size`] on each such slice counts the shared
+/// buffers once per slice, while sharing `counted_buffers` across the calls
+/// counts each buffer exactly once.
+pub fn count_record_batch_memory_size(

Review Comment:
   @2010YOUY01 I went ahead and implemented that, thank you for the speedy 
review!  Feel free to merge it once CI passes :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to