westonpace opened a new issue, #7379:
URL: https://github.com/apache/arrow-rs/issues/7379

   **Describe the bug**
   From the documentation for `ArrayData::nulls` we can see that the 
`ArrayData` offset should be ignored when handling the null buffer.  Most 
methods work this way.  However, in the `validate` method, we are validating 
the null buffers physical length and using the `ArrayData` offset to do the 
validation:
   
   ```
   impl ArrayData {
       ...
       pub fn validate(&self) -> Result<(), ArrowError> {
           // Need at least this mich space in each buffer
           let len_plus_offset = self.len + self.offset;
           ...
   
           // check null bit buffer size
           if let Some(nulls) = self.nulls() {
               if nulls.null_count() > self.len {
                   return Err(ArrowError::InvalidArgumentError(format!(
                       "null_count {} for an array exceeds length of {} 
elements",
                       nulls.null_count(),
                       self.len
                   )));
               }
   
               let actual_len = nulls.validity().len();
   
               // ERROR HERE: len_plus_offset is the using the ArrayData offset 
and not the null
               // buffer offset.  As a result, needed_len can be too large in 
some cases (false error)
               // or too small in some cases (potentially unsafe)
   
               let needed_len = bit_util::ceil(len_plus_offset, 8);
               if actual_len < needed_len {
                   return Err(ArrowError::InvalidArgumentError(format!(
                       "null_bit_buffer size too small. got {actual_len} needed 
{needed_len}",
                   )));
               }
   
               if nulls.len() != self.len {
                   return Err(ArrowError::InvalidArgumentError(format!(
                       "null buffer incorrect size. got {} expected {}",
                       nulls.len(),
                       self.len
                   )));
               }
           }
           ...
       }
   ...
   }
   ```
   
   **To Reproduce**
   ```
       #[test]
       fn test_null_buffer_no_offset() {
           // Create an ArrayData for an int array with 100 underlying elements 
sliced with an offset
           // 50 to create an array of 50 elements.
           let int_data_builder = ArrayData::builder(DataType::UInt32)
               .offset(50)
               .len(50)
               .add_buffer(Buffer::from_vec(vec![0_u32; 100]));
           let int_data = int_data_builder.build().unwrap();
           int_data.validate().unwrap();
   
           // Using a sliced null buffer works because the underlying values 
buffer is long enough
           let null_buffer = NullBuffer::new(BooleanBuffer::from(vec![false; 
100]).slice(0, 50));
           int_data
               .clone()
               .into_builder()
               .nulls(Some(null_buffer))
               .build()
               .unwrap()
               .validate()
               .unwrap();
   
           // Using a non-sliced null buffer does not work but maybe should?
           let null_buffer = NullBuffer::new(BooleanBuffer::from(vec![false; 
50]));
           int_data
               .clone()
               .into_builder()
               .nulls(Some(null_buffer))
               .build()
               // Panics here: called `Result::unwrap()` on an `Err` value: 
InvalidArgumentError("null_bit_buffer size too small. got 7 needed 13")
               .unwrap()
               .validate()
               .unwrap();
       }
   ```
   
   **Expected behavior**
   The comparison should use the null buffer's offset and not the array data's 
offset.
   
   **Additional context**
   This error is not likely to be encountered in practice.  Users are not 
typically creating `ArrayData` from scratch.  In most cases they are receiving 
the `ArrayData` from the various builders or from FFI.  All of these paths are 
safe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to