Re: [PR] Allow constructing ByteViewArray from existing blocks [arrow-rs]

via GitHub Tue, 28 May 2024 11:57:34 -0700


alamb commented on code in PR #5796:
URL: https://github.com/apache/arrow-rs/pull/5796#discussion_r1617761742



##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -62,6 +83,98 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {
         Self { block_size, ..self }
     }
 
+    /// Append a new data block returning the new block offset
+    ///
+    /// Note: this will first flush any in-progress block
+    ///
+    /// This allows appending views from blocks added using 
[`Self::append_block`]. See
+    /// [`Self::append_value`] for appending individual values
+    ///
+    /// ```
+    /// # use arrow_array::builder::StringViewBuilder;
+    /// let mut builder = StringViewBuilder::new();
+    ///
+    /// let block = builder.append_block(b"helloworldbingobongo".into());
+    ///
+    /// builder.try_append_view(block, 0, 5).unwrap();
+    /// builder.try_append_view(block, 5, 5).unwrap();
+    /// builder.try_append_view(block, 10, 5).unwrap();
+    /// builder.try_append_view(block, 15, 5).unwrap();
+    /// builder.try_append_view(block, 0, 15).unwrap();
+    /// let array = builder.finish();
+    ///
+    /// let actual: Vec<_> = array.iter().flatten().collect();
+    /// let expected = &["hello", "world", "bingo", "bongo", 
"helloworldbingo"];
+    /// assert_eq!(actual, expected);
+    /// ```
+    pub fn append_block(&mut self, buffer: Buffer) -> u32 {
+        assert!(buffer.len() < u32::MAX as usize);
+
+        self.flush_in_progress();
+        let offset = self.completed.len();
+        self.push_completed(buffer);
+        offset as u32
+    }
+
+    /// Try to append a view of the given `block`, `offset` and `length`
+    ///
+    /// See [`Self::append_block`]
+    pub fn try_append_view(&mut self, block: u32, offset: u32, len: u32) -> 
Result<(), ArrowError> {

Review Comment:
   It seems like we have a filter benchmark but not a raw array creation speed 
benchmark
   
   
https://github.com/apache/arrow-rs/blob/9828bf0bcd3e54ff5c51154ee99d183d9ee171fa/arrow/src/util/bench_util.rs#L141
   
   I agree let's start like this and then add benchmarks (like reading from 
parquet) and if they show slow downs we can add unchecked versions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Allow constructing ByteViewArray from existing blocks [arrow-rs]

Reply via email to