tustvold opened a new issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072
**Describe the bug**
The string comparison kernels panic if they encounter non-monotonic offsets,
even if the values for which they are encountered are null. Currently
`ArrayDataBuilder`'s validation catches such arrays, however, I think this is
actually a bug (#1071) as I don't think the arrow specification forbids nulls
from having arbitrary offsets. I encountered this whilst working on a parquet
StringArray decoder where I was hoping to leave the offsets for null values
default zero-initialized and save some cycles.
**To Reproduce**
```
// Test handling of nulls with arbitrary offsets
let offsets = vec![2, 0, 2, 2];
let validity = vec![false, true, false];
let values = "ab";
let mut offsets_buffer = MutableBuffer::new(offsets.len() * 4);
offsets_buffer.extend_from_slice(&offsets);
let validity_buffer = MutableBuffer::from_iter(validity.iter().cloned());
let mut values_buffer = MutableBuffer::new(values.len());
values_buffer.extend_from_slice(values.as_bytes());
let builder = ArrayDataBuilder::new(DataType::Utf8)
.len(validity.len())
.add_buffer(offsets_buffer.into())
.add_buffer(values_buffer.into())
.null_bit_buffer(validity_buffer.into());
// Validation prevents non-monotonic offsets (#1071)
let data = unsafe { builder.build_unchecked() };
let string = StringArray::from(data);
let eq = eq_utf8(&string, &string).unwrap();
assert!(min_boolean(&eq).unwrap())
```
**Expected behavior**
I would expect the above to pass, all non-null values have valid offsets.
Instead it panics with
```
called `Option::unwrap()` on a `None` value
thread 'compute::kernels::comparison::tests::test_utf8_array_nulls' panicked
at 'called `Option::unwrap()` on a `None` value',
arrow/src/array/array_string.rs:101:40
stack backtrace:
0: rust_begin_unwind
at
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
1: core::panicking::panic_fmt
at
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
2: core::panicking::panic
at
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:50:5
3: core::option::Option<T>::unwrap
at
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/option.rs:746:21
4:
arrow::array::array_string::GenericStringArray<OffsetSize>::value_unchecked
at ./src/array/array_string.rs:101:13
5: arrow::compute::kernels::comparison::eq_utf8::{{closure}}
at ./src/compute/kernels/comparison.rs:56:35
6: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for
&mut F>::call_once
at
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:280:13
7: core::option::Option<T>::map
at
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/option.rs:848:29
8: <core::iter::adapters::map::Map<I,F> as
core::iter::traits::iterator::Iterator>::next
at
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/iter/adapters/map.rs:103:9
9: arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
at ./src/buffer/mutable.rs:489:38
10: arrow::compute::kernels::comparison::eq_utf8
at ./src/compute/kernels/comparison.rs:685:5
11: arrow::compute::kernels::comparison::tests::test_utf8_array_nulls
at ./src/compute/kernels/comparison.rs:2305:18
12:
arrow::compute::kernels::comparison::tests::test_utf8_array_nulls::{{closure}}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]