jhorstmann commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644266958
##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
} else {
buffer.push(0i32)
}
- let str_values_size = arrays
- .iter()
- .map(|data| {
- // get the length of the value buffer
- let buf_len = data.buffers()[1].len();
- // find the offset of the buffer
- // this returns a slice of offsets, starting from the offset of
the array
- // so we can take the first value
- let offset = data.buffer::<Offset>(0)[0];
- buf_len - offset.to_usize().unwrap()
- })
- .sum::<usize>();
[
buffer,
- MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+ MutableBuffer::new(binary_size * mem::size_of::<u8>()),
]
}
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+ /// Binary, Utf8 and LargeUtf8 data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the binary/ str buffer
+ Binary(usize, Option<usize>),
+ /// List and LargeList data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the child data
+ List(usize, Option<Box<Capacities>>),
+ /// Struct type
+ /// * the capacity of the array
+ /// * the capacities of the fields
+ Struct(usize, Option<Vec<Capacities>>),
+ /// Dictionary type
+ /// * the capacity of the array
Review comment:
I haven't thought this through fully yet. Maybe the caller should
actually specify the merged dictionary values array instead of a strategy. And
that, or specifying the correct strategy, requires a special concat kernel
anyway, so it's basically the same situation as now. I think you do not need to
consider this use case in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]