alchemist51 commented on code in PR #9138:
URL: https://github.com/apache/arrow-rs/pull/9138#discussion_r2740878515


##########
arrow-memory-size-derive/src/lib.rs:
##########
@@ -0,0 +1,457 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Derive macro for the [`HeapSize`] trait.
+//!
+//! This crate provides a `#[derive(HeapSize)]` macro that automatically
+//! implements the `HeapSize` trait for structs and enums.
+//!
+//! # Example
+//!
+//! ```rust,ignore
+//! use arrow_memory_size::HeapSize;
+//! use arrow_memory_size_derive::HeapSize;
+//!
+//! #[derive(HeapSize)]
+//! struct MyStruct {
+//!     name: String,
+//!     data: Vec<u8>,
+//!     count: i32,
+//! }
+//! ```
+//!
+//! # Field Attributes
+//!
+//! The derive macro supports several attributes to customize behavior:
+//!
+//! ## `#[heap_size(ignore)]`
+//!
+//! Skip this field entirely (contributes 0 to heap size).
+//!
+//! ```rust,ignore
+//! #[derive(HeapSize)]
+//! struct MyStruct {
+//!     data: Vec<u8>,
+//!     #[heap_size(ignore)]
+//!     cached_hash: u64,  // Not counted
+//! }
+//! ```
+//!
+//! ## `#[heap_size(size = N)]`
+//!
+//! Use a constant value instead of calling `heap_size()`.
+//!
+//! ```rust,ignore
+//! #[derive(HeapSize)]
+//! struct MyStruct {
+//!     #[heap_size(size = 1024)]
+//!     fixed_buffer: *const u8,  // Known to be 1KB
+//! }
+//! ```
+//!
+//! ## `#[heap_size(size_fn = path)]`
+//!
+//! Call a custom function to compute the heap size.
+//! The function must have signature `fn(&FieldType) -> usize`.
+//!
+//! ```rust,ignore
+//! fn custom_size(data: &ExternalType) -> usize {
+//!     data.len() * 8
+//! }
+//!
+//! #[derive(HeapSize)]
+//! struct MyStruct {
+//!     #[heap_size(size_fn = custom_size)]

Review Comment:
   Loved the interface to add custom size for custom objects!



##########
arrow-array/src/heap_size.rs:
##########
@@ -0,0 +1,138 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! [`HeapSize`] implementations for arrow-array types
+
+use arrow_memory_size::HeapSize;
+
+use crate::Array;
+use crate::types::{ArrowDictionaryKeyType, ArrowPrimitiveType, 
RunEndIndexType};
+use crate::{
+    BinaryArray, BinaryViewArray, BooleanArray, DictionaryArray, 
FixedSizeBinaryArray,
+    FixedSizeListArray, LargeBinaryArray, LargeListArray, LargeListViewArray, 
LargeStringArray,
+    ListArray, ListViewArray, MapArray, NullArray, PrimitiveArray, RunArray, 
StringArray,
+    StringViewArray, StructArray, UnionArray,
+};
+
+// Note: A blanket implementation `impl<T: Array> HeapSize for T` would be 
ideal,
+// but is not possible due to Rust's orphan rules (E0210) since HeapSize is 
defined
+// in a separate crate.
+//
+// Note: HeapSize cannot be implemented for ArrayRef (Arc<dyn Array>) here due 
to
+// Rust's orphan rules. Use array.get_buffer_memory_size() directly instead.
+
+/// Implements HeapSize for array types that delegate to 
get_buffer_memory_size()
+macro_rules! impl_heap_size {
+    ($($ty:ty),*) => {
+        $(
+            impl HeapSize for $ty {
+                fn heap_size(&self) -> usize {
+                    self.get_buffer_memory_size()
+                }
+            }
+        )*
+    };
+}
+
+impl_heap_size!(
+    BooleanArray,
+    NullArray,
+    StringArray,
+    LargeStringArray,
+    BinaryArray,
+    LargeBinaryArray,
+    StringViewArray,
+    BinaryViewArray,
+    FixedSizeBinaryArray,
+    ListArray,
+    LargeListArray,
+    ListViewArray,
+    LargeListViewArray,
+    FixedSizeListArray,
+    StructArray,
+    MapArray,
+    UnionArray
+);
+
+impl<T: ArrowPrimitiveType> HeapSize for PrimitiveArray<T> {
+    fn heap_size(&self) -> usize {
+        self.get_buffer_memory_size()
+    }
+}
+
+impl<K: ArrowDictionaryKeyType> HeapSize for DictionaryArray<K> {
+    fn heap_size(&self) -> usize {
+        self.get_buffer_memory_size()
+    }
+}
+
+impl<R: RunEndIndexType> HeapSize for RunArray<R> {
+    fn heap_size(&self) -> usize {
+        self.get_buffer_memory_size()
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::Int32Array;
+
+    #[test]
+    fn test_primitive_array_heap_size() {
+        let array = Int32Array::from(vec![1, 2, 3, 4, 5]);
+        let size = array.heap_size();
+        assert!(size >= 5 * std::mem::size_of::<i32>());

Review Comment:
   Can it not be accurately compared?



##########
arrow-memory-size/README.md:
##########
@@ -0,0 +1,231 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# `arrow-memory-size`
+
+[![crates.io](https://img.shields.io/crates/v/arrow-memory-size.svg)](https://crates.io/crates/arrow-memory-size)
+[![docs.rs](https://img.shields.io/docsrs/arrow-memory-size.svg)](https://docs.rs/arrow-memory-size/latest/arrow_memory_size/)
+
+Memory size estimation utilities for [Apache Arrow].
+
+This crate provides the `HeapSize` trait for calculating heap memory usage of 
data structures.
+
+[Apache Arrow]: https://arrow.apache.org/
+
+## Why This Crate?
+
+Several memory size estimation crates exist in the Rust ecosystem ([deepsize], 
[get-size2], etc.), but none has emerged as a clear standard. Rather than take 
a dependency on any of them, this crate provides a minimal `HeapSize` trait 
with a small API surface that can be implemented across the Arrow ecosystem.
+
+Key motivations:
+
+- **Minimal API**: Just two methods (`heap_size()` and `total_size()`)

Review Comment:
   I didn't see the total_size() being exposed, can you point me to that 
implementation? In case we are targeting for total_size. One can't use the 
ignore type in FieldAttr right? Since the memory allocation has to belong to 
either stack or heap? Should we keep that type?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to