tustvold commented on code in PR #4817:
URL: https://github.com/apache/arrow-rs/pull/4817#discussion_r1326350700
##########
arrow-row/src/lib.rs:
##########
@@ -1002,6 +1002,46 @@ impl Rows {
+ self.buffer.len()
+ self.offsets.len() * std::mem::size_of::<usize>()
}
+
+ /// Return a new Rows sliced according to `offset` and `length`
+ /// This copies memory from the original Rows
+ ///
+ /// # Panics
+ ///
+ /// Panics if `offset` with `length` is greater than row length.
+ pub fn slice(&self, offset: usize, length: usize) -> Self {
+ assert!(
+ offset + length <= self.num_rows(),
+ "offset + length is greater than row length"
+ );
+
+ let Rows {
+ buffer: current_buffer,
+ offsets: current_offsets,
+ config,
+ } = self;
+
+ let start = current_offsets[offset];
+ let end = current_offsets[offset + length];
+ let mut buffer: Vec<u8> = vec![0_u8; end - start];
+ buffer.copy_from_slice(¤t_buffer[start..end]);
Review Comment:
Using Vec::with_capacity() and then extend_from_slice should be faster as it
would avoid zero-ing out memory
##########
arrow-row/src/lib.rs:
##########
@@ -1002,6 +1002,46 @@ impl Rows {
+ self.buffer.len()
+ self.offsets.len() * std::mem::size_of::<usize>()
}
+
+ /// Return a new Rows sliced according to `offset` and `length`
+ /// This copies memory from the original Rows
+ ///
+ /// # Panics
+ ///
+ /// Panics if `offset` with `length` is greater than row length.
+ pub fn slice(&self, offset: usize, length: usize) -> Self {
+ assert!(
+ offset + length <= self.num_rows(),
+ "offset + length is greater than row length"
+ );
+
+ let Rows {
+ buffer: current_buffer,
+ offsets: current_offsets,
+ config,
+ } = self;
+
+ let start = current_offsets[offset];
+ let end = current_offsets[offset + length];
+ let mut buffer: Vec<u8> = vec![0_u8; end - start];
+ buffer.copy_from_slice(¤t_buffer[start..end]);
+
+ let mut offsets: Vec<usize> = vec![0_usize; length + 1];
+ offsets.copy_from_slice(¤t_offsets[offset..offset + length + 1]);
+ // mutate offsets to match new buffer length
+ offsets = offsets.iter_mut().map(|x| *x - start).collect();
+
+ if length > 0 {
+ assert_eq!(offsets.last().unwrap(), &buffer.len());
Review Comment:
These are just sanity checks correct?
##########
arrow-row/src/lib.rs:
##########
@@ -1002,6 +1002,46 @@ impl Rows {
+ self.buffer.len()
+ self.offsets.len() * std::mem::size_of::<usize>()
}
+
+ /// Return a new Rows sliced according to `offset` and `length`
+ /// This copies memory from the original Rows
+ ///
+ /// # Panics
+ ///
+ /// Panics if `offset` with `length` is greater than row length.
+ pub fn slice(&self, offset: usize, length: usize) -> Self {
+ assert!(
+ offset + length <= self.num_rows(),
Review Comment:
```suggestion
offset.saturating_add(length) <= self.num_rows(),
```
Otherwise this will misbehave on overflow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]