andygrove commented on issue #540:
URL: 
https://github.com/apache/datafusion-comet/issues/540#issuecomment-2156065507

   The error is happening in unsafe code in arrow-rs.
   
   Here is some debug output showing the calls leading up the the error:
   
   ```
   copy_or_cast_array() len=8192, type=Utf8
   copy_array(typeUtf8, len=8192) before mutable.extend(0, 0, 8192) data len = 
8192
   getNextBatch id=2284, plan=135761185342032 failed during native execution: 
range end index 294912 out of range for slice of length 147456
   ```
   
   Note that there are many earlier calls that look identical and do not fail.
   
   The error happens in `arrow_data::transform::variable_size::build_extend`:
   
   ```
           at core::panicking::panic_fmt(__internal__:0)
           at core::slice::index::slice_end_index_len_fail(__internal__:0)
           at 
arrow_data::transform::variable_size::build_extend::{{closure}}(__internal__:0)
           at arrow_data::transform::MutableArrayData::extend(__internal__:0)
           at comet::execution::operators::copy_array(__internal__:0)
           at comet::execution::operators::copy_or_cast_array(__internal__:0)
   ```
   
   This function calls `get_last_offset` to get the last offset and there are 
some documented assumptions. I wonder if we are violating any of these?
   
   
   ```rust
   pub(super) unsafe fn get_last_offset<T: ArrowNativeType>(offset_buffer: 
&MutableBuffer) -> T {
       // JUSTIFICATION
       //  Benefit
       //      20% performance improvement extend of variable sized arrays (see 
bench `mutable_array`)
       //  Soundness
       //      * offset buffer is always extended in slices of T and aligned 
accordingly.
       //      * Buffer[0] is initialized with one element, 0, and thus 
`mutable_offsets.len() - 1` is always valid.
       let (prefix, offsets, suffix) = offset_buffer.as_slice().align_to::<T>();
       debug_assert!(prefix.is_empty() && suffix.is_empty());
       *offsets.get_unchecked(offsets.len() - 1)
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to