andygrove commented on issue #540:
URL:
https://github.com/apache/datafusion-comet/issues/540#issuecomment-2156065507
The error is happening in unsafe code in arrow-rs.
Here is some debug output showing the calls leading up the the error:
```
copy_or_cast_array() len=8192, type=Utf8
copy_array(typeUtf8, len=8192) before mutable.extend(0, 0, 8192) data len =
8192
getNextBatch id=2284, plan=135761185342032 failed during native execution:
range end index 294912 out of range for slice of length 147456
```
Note that there are many earlier calls that look identical and do not fail.
The error happens in `arrow_data::transform::variable_size::build_extend`:
```
at core::panicking::panic_fmt(__internal__:0)
at core::slice::index::slice_end_index_len_fail(__internal__:0)
at
arrow_data::transform::variable_size::build_extend::{{closure}}(__internal__:0)
at arrow_data::transform::MutableArrayData::extend(__internal__:0)
at comet::execution::operators::copy_array(__internal__:0)
at comet::execution::operators::copy_or_cast_array(__internal__:0)
```
This function calls `get_last_offset` to get the last offset and there are
some documented assumptions. I wonder if we are violating any of these?
```rust
pub(super) unsafe fn get_last_offset<T: ArrowNativeType>(offset_buffer:
&MutableBuffer) -> T {
// JUSTIFICATION
// Benefit
// 20% performance improvement extend of variable sized arrays (see
bench `mutable_array`)
// Soundness
// * offset buffer is always extended in slices of T and aligned
accordingly.
// * Buffer[0] is initialized with one element, 0, and thus
`mutable_offsets.len() - 1` is always valid.
let (prefix, offsets, suffix) = offset_buffer.as_slice().align_to::<T>();
debug_assert!(prefix.is_empty() && suffix.is_empty());
*offsets.get_unchecked(offsets.len() - 1)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]