jorgecarleitao commented on a change in pull request #8960:
URL: https://github.com/apache/arrow/pull/8960#discussion_r548387772



##########
File path: rust/arrow/src/array/transform/variable_size.rs
##########
@@ -43,33 +47,33 @@ pub(super) fn build_extend<T: OffsetSizeTrait>(array: 
&ArrayData) -> Extend {
         // fast case where we can copy regions without null issues
         Box::new(
             move |mutable: &mut _MutableArrayData, _, start: usize, len: 
usize| {
-                let mutable_offsets = mutable.buffer::<T>(0);
-                let last_offset = mutable_offsets[mutable_offsets.len() - 1];
-                // offsets
-                let buffer = &mut mutable.buffers[0];
+                let offset_buffer = &mut mutable.buffer1;
+                let values_buffer = &mut mutable.buffer2;
+
+                // this is safe due to how offset is built. See details on 
`get_last_offset`
+                let last_offset = unsafe { get_last_offset(offset_buffer) };

Review comment:
       I get your point.
   
   `get_last_offset` must only be used in offset buffers, i.e. buffers whose 
bytes represent a i32 or i64 and were specifically built from those types. 
Doing so in other buffers is undefined behavior (even with the safeguards of 
using `align_to`). We can remove the `unsafe` mark from `get_last_offset`, 
though.
   
   My proposal is that we refactor the `src/transform` code so that it has a 
struct specific for each array type (that implements some trait for `dyn` 
support). This will allow `last_offset` to be stored in the array-specific 
struct, thereby avoiding this problem altogether (of having to read bytes 
written to the `MutableBuffer`).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to