Dandandan commented on code in PR #7309:
URL: https://github.com/apache/arrow-rs/pull/7309#discussion_r2023700374
##########
arrow-array/src/builder/generic_bytes_builder.rs:
##########
@@ -129,6 +129,48 @@ impl<T: ByteArrayType> GenericByteBuilder<T> {
self.offsets_builder.append(self.next_offset());
}
+ /// Appends array values and null to this builder as is
+ /// (this means that underlying null values are copied as is).
+ #[inline]
+ pub fn append_array(&mut self, array: &GenericByteArray<T>) {
+ if array.len() == 0 {
+ return;
+ }
+
+ let offsets = array.offsets();
+
+ // If the offsets are contiguous, we can append them directly avoiding
the need to align
+ // them
+ if self.next_offset() == offsets[0] {
+ self.offsets_builder.append_slice(&offsets[1..]);
+ } else {
+ // Shifting all the offsets
+ let shift: T::Offset = self.next_offset() - offsets[0];
+
+ // Creating intermediate offsets instead of pushing each offset is
faster
+ // (even if we make MutableBuffer to avoid updating length on each
push
+ // and reserve the necessary capacity, it's still slower)
+ let mut intermediate = Vec::with_capacity(offsets.len() - 1);
Review Comment:
I think many of these instances could be changed to use `Vec` directly (and
use optimized extend, etc. from them)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]