tustvold commented on code in PR #6616:
URL: https://github.com/apache/arrow-rs/pull/6616#discussion_r1813122957


##########
arrow-select/src/take.rs:
##########
@@ -2382,4 +2389,29 @@ mod tests {
         let array = take(&array, &indicies, None).unwrap();
         assert_eq!(array.len(), 3);
     }
+
+    #[test]
+    fn test_take_bytes_null_indices_modified() {
+        let indices = Int32Array::new(
+            vec![0, 0, 0, 0].into(),
+            Some(NullBuffer::from_iter(vec![false, true, true, true])),
+        );
+        let values = StringArray::from(vec![Some("foo")]);
+        let r = take(&values, &indices, None).unwrap();
+
+        // Modify indices null buffer
+        let (_, _, nulls) = indices.into_parts();
+        assert!(nulls.is_some());
+        let null_buffer = nulls.unwrap();
+        let binding = null_buffer.into_inner().into_inner();
+        let null_slice = binding.data_ptr();
+        for i in 0..4 {
+            unsafe {
+                bit_util::unset_bit_raw(null_slice.as_ptr(), i);

Review Comment:
   > This just simulates the case that the null buffer is modified after take 
is called.
   
   Which is UB, you are mutating a buffer without exclusive access to it. The 
optimisation of reusing the buffer is perfectly sound because of the aliasing 
invariants of Rust, not only do we rely on this extensively within both arrow 
and DataFusion, but the compiler itself relies on it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to