ritchie46 commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644194518



##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
     } else {
         buffer.push(0i32)
     }
-    let str_values_size = arrays
-        .iter()
-        .map(|data| {
-            // get the length of the value buffer
-            let buf_len = data.buffers()[1].len();
-            // find the offset of the buffer
-            // this returns a slice of offsets, starting from the offset of 
the array
-            // so we can take the first value
-            let offset = data.buffer::<Offset>(0)[0];
-            buf_len - offset.to_usize().unwrap()
-        })
-        .sum::<usize>();
 
     [
         buffer,
-        MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+        MutableBuffer::new(binary_size * mem::size_of::<u8>()),
     ]
 }
 
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+    /// Binary, Utf8 and LargeUtf8 data types
+    /// Define
+    /// * the capacity of the array offsets
+    /// * the capacity of the binary/ str buffer
+    Binary(usize, Option<usize>),
+    /// List and LargeList data types
+    /// Define
+    /// * the capacity of the array offsets
+    /// * the capacity of the child data
+    List(usize, Option<Box<Capacities>>),
+    /// Struct type
+    /// * the capacity of the array
+    /// * the capacities of the fields
+    Struct(usize, Option<Vec<Capacities>>),
+    /// Dictionary type
+    /// * the capacity of the array

Review comment:
       > What is the difference here between the "capacity of the array" and 
"the capacity of the keys"?
   
   Yes, that can be the same.
   
   I also understand the use of defining the merge strategy. Regarding  the 
API, should we make that available in a different method. Similar to `HashMap` 
having a method:
   
   ```rust
   HashMap::with_capacity(...)
   // and
   HashMap::with_capacity_and_hasher(...)
   ```
   
   We could do
   
   ```rust
   MutableArrayData::with_capacies(...)
   // and
   MutableArrayData::with_capacities_and_merge_strategy(...)
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to