ritchie46 opened a new issue #502:
URL: https://github.com/apache/arrow-rs/issues/502


   **Describe the bug**
   Sliced arrays lead to incorrect null buffers. I have found this in code 
where I copied the `null` buffer of one array to that of a new one (without 
offset). 
   
   I could pinpoint this bug in the `take` kernel, but I think it may occur in 
other parts as well.
   
   In the code snippet below, you see the output of the array having two zero 
values where that should be null. As these are the values that should not have 
their validity bits set, it could probably be any other value at runtime.
   
   *I could also reproduce this with other StringArray*
   
   **To Reproduce**
   ```rust
   use arrow::array::{UInt32Array, Array, LargeStringArray, PrimitiveArray, 
Int32Array};
   use arrow::compute::take;
   use arrow::datatypes::UInt32Type;
   
   fn main() {
       let idx0 = UInt32Array::from(vec![Some(0), Some(1), Some(2), Some(3), 
Some(4), Some(5), Some(6), None, None]);
       let sliced_idx_arr = idx0.slice(5, 4);
       let sliced_idx = 
sliced_idx_arr.as_any().downcast_ref::<UInt32Array>().unwrap();
   
       let idx1 = UInt32Array::from(vec![Some(5), Some(6), None, None]);
   
       println!("indices:\n{:?}\n{:?}\n\n\n", &idx1, &sliced_idx);
       // this is true
       assert_eq!(&sliced_idx.into_iter().collect::<Vec<_>>(), 
&idx1.into_iter().collect::<Vec<_>>());
   
   
       let array = Int32Array::from(vec![0, 1, 2, 3, 4, 5, 6, 7, 8]);
       let taken1_arr= take(&array, &idx1, None).unwrap();
       let sliced_taken_arr = take(&array, &sliced_idx, None).unwrap();
   
       let taken1 = taken1_arr.as_any().downcast_ref::<Int32Array>().unwrap();
       let sliced_taken = 
sliced_taken_arr.as_any().downcast_ref::<Int32Array>().unwrap();
   
       let vec1: Vec<_> = taken1.into_iter().collect();
       let sliced_vec: Vec<_> = sliced_taken.into_iter().collect();
   
       println!("taken arrays:\n{:?}\n{:?}", &taken1, &sliced_taken);
   
       assert_eq!(vec1, sliced_vec)
   }
   ```
   
   **Output**
   
   ```
   indices:
   PrimitiveArray<UInt32>
   [
     5,
     6,
     null,
     null,
   ]
   PrimitiveArray<UInt32>
   [
     5,
     6,
     null,
     null,
   ]
   
   
   
   taken arrays:
   PrimitiveArray<Int32>
   [
     5,
     6,
     null,
     null,
   ]
   PrimitiveArray<Int32>
   [
     5,
     6,
     0,
     0,
   ]
   thread 'main' panicked at 'assertion failed: `(left == right)`
     left: `[Some(5), Some(6), None, None]`,
    right: `[Some(5), Some(6), Some(0), Some(0)]`', src/main.rs:29:5
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   
   ```
   
   **Expected behavior**
   I'd expect the sliced array to have the same result as the array that is not 
sliced.
   
   **Additional context**
   https://github.com/pola-rs/polars/issues/878
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to