ritchie46 commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644194518
##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
} else {
buffer.push(0i32)
}
- let str_values_size = arrays
- .iter()
- .map(|data| {
- // get the length of the value buffer
- let buf_len = data.buffers()[1].len();
- // find the offset of the buffer
- // this returns a slice of offsets, starting from the offset of
the array
- // so we can take the first value
- let offset = data.buffer::<Offset>(0)[0];
- buf_len - offset.to_usize().unwrap()
- })
- .sum::<usize>();
[
buffer,
- MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+ MutableBuffer::new(binary_size * mem::size_of::<u8>()),
]
}
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+ /// Binary, Utf8 and LargeUtf8 data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the binary/ str buffer
+ Binary(usize, Option<usize>),
+ /// List and LargeList data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the child data
+ List(usize, Option<Box<Capacities>>),
+ /// Struct type
+ /// * the capacity of the array
+ /// * the capacities of the fields
+ Struct(usize, Option<Vec<Capacities>>),
+ /// Dictionary type
+ /// * the capacity of the array
Review comment:
> What is the difference here between the "capacity of the array" and
"the capacity of the keys"?
Yes, that can be the same.
I also understand the use of defining the merge strategy. Regarding the
API, should we make that available in a different method. Similar to `HashMap`
having a method:
```rust
HashMap::with_capacity(...)
// and
HashMap::with_capacity_and_hasher(...)
```
We could do
```rust
MutableArrayData::with_capacies(...)
// and
MutableArrayData::with_capacities_and_merge_strategy(...)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]