andygrove commented on issue #540: URL: https://github.com/apache/datafusion-comet/issues/540#issuecomment-2156065507
The error is happening in unsafe code in arrow-rs. Here is some debug output showing the calls leading up the the error: ``` copy_or_cast_array() len=8192, type=Utf8 copy_array(typeUtf8, len=8192) before mutable.extend(0, 0, 8192) data len = 8192 getNextBatch id=2284, plan=135761185342032 failed during native execution: range end index 294912 out of range for slice of length 147456 ``` Note that there are many earlier calls that look identical and do not fail. The error happens in `arrow_data::transform::variable_size::build_extend`: ``` at core::panicking::panic_fmt(__internal__:0) at core::slice::index::slice_end_index_len_fail(__internal__:0) at arrow_data::transform::variable_size::build_extend::{{closure}}(__internal__:0) at arrow_data::transform::MutableArrayData::extend(__internal__:0) at comet::execution::operators::copy_array(__internal__:0) at comet::execution::operators::copy_or_cast_array(__internal__:0) ``` This function calls `get_last_offset` to get the last offset and there are some documented assumptions. I wonder if we are violating any of these? ```rust pub(super) unsafe fn get_last_offset<T: ArrowNativeType>(offset_buffer: &MutableBuffer) -> T { // JUSTIFICATION // Benefit // 20% performance improvement extend of variable sized arrays (see bench `mutable_array`) // Soundness // * offset buffer is always extended in slices of T and aligned accordingly. // * Buffer[0] is initialized with one element, 0, and thus `mutable_offsets.len() - 1` is always valid. let (prefix, offsets, suffix) = offset_buffer.as_slice().align_to::<T>(); debug_assert!(prefix.is_empty() && suffix.is_empty()); *offsets.get_unchecked(offsets.len() - 1) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org