alamb commented on code in PR #8040:
URL: https://github.com/apache/arrow-rs/pull/8040#discussion_r2283282889


##########
arrow/examples/memory_tracking.rs:
##########
@@ -0,0 +1,65 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Example demonstrating the Array memory tracking functionality
+
+use arrow_array::{Array, Int32Array, ListArray};
+use arrow_buffer::{MemoryPool, TrackingMemoryPool};
+use arrow_schema::{DataType, Field};
+use std::sync::Arc;
+
+fn main() {
+    let pool = TrackingMemoryPool::default();
+
+    println!("Arrow Array Memory Tracking Example");
+    println!("===================================");
+
+    // Basic array memory tracking
+    let array = Int32Array::from(vec![1, 2, 3, 4, 5]);
+    array.claim(&pool);
+    println!("Int32Array (5 elements): {} bytes", pool.used());
+
+    // Nested array (recursive tracking)
+    let offsets = arrow_buffer::OffsetBuffer::new(vec![0, 2, 4].into());
+    let field = Arc::new(Field::new("item", DataType::Int32, false));
+    let list_array = ListArray::new(field, offsets, Arc::new(array), None);
+
+    let before_list = pool.used();
+    list_array.claim(&pool);
+    let after_list = pool.used();
+    println!("ListArray (nested): +{} bytes", after_list - before_list);
+
+    // No double-counting for derived arrays
+    let large_array = Int32Array::from((0..1000).collect::<Vec<i32>>());
+    large_array.claim(&pool);
+    let original_usage = pool.used();
+    println!("Original array (1000 elements): {original_usage} bytes");
+
+    // Create and claim slices - should not increase memory usage
+    let slice1 = large_array.slice(0, 100);
+    let slice2 = large_array.slice(500, 200);
+
+    slice1.claim(&pool);
+    slice2.claim(&pool);
+    let final_usage = pool.used();
+
+    println!("After claiming 2 slices: {final_usage} bytes");

Review Comment:
   I think these should actually test the bytes used (not just print them out)



##########
arrow-array/src/array/mod.rs:
##########
@@ -336,6 +336,34 @@ pub trait Array: std::fmt::Debug + Send + Sync {
     /// This value will always be greater than returned by 
`get_buffer_memory_size()` and
     /// includes the overhead of the data structures that contain the pointers 
to the various buffers.
     fn get_array_memory_size(&self) -> usize;
+
+    /// Claim memory used by this array in the provided memory pool.
+    ///
+    /// This recursively claims memory for:
+    /// - All data buffers in this array
+    /// - All child arrays (for nested types like List, Struct, etc.)
+    /// - The null bitmap buffer if present
+    ///
+    /// This method guarantees that the memory pool will only compute occupied 
memory
+    /// exactly once. For example, if this array is derived from operations 
like `slice`,
+    /// calling `claim` on it would not change the memory pool's usage if the 
underlying buffers
+    /// are already counted before.
+    ///
+    /// # Example
+    /// ```
+    /// # use arrow_array::{Int32Array, Array};
+    /// # use arrow_buffer::TrackingMemoryPool;
+    ///
+    /// let array = Int32Array::from(vec![1, 2, 3, 4, 5]);
+    /// let pool = TrackingMemoryPool::default();
+    ///
+    /// // Claim the array's memory in the pool
+    /// array.claim(&pool);

Review Comment:
   Could you also add an example (either here or elsewhere) of how one would 
use `claim`? 
   
   For example, if we now did
   ```rust
   let array2 = array1.slice(0, 1);
   ```
   
   Is the idea that now array2.array_memory_size() would be zero?
   



##########
arrow/examples/memory_tracking.rs:
##########
@@ -0,0 +1,65 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Example demonstrating the Array memory tracking functionality
+
+use arrow_array::{Array, Int32Array, ListArray};
+use arrow_buffer::{MemoryPool, TrackingMemoryPool};
+use arrow_schema::{DataType, Field};
+use std::sync::Arc;
+
+fn main() {

Review Comment:
   Yes, please -- I think it would be much easier to find as an doc test -- 
perhaps you could just move it to  `Array::claim` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to