alamb commented on code in PR #9338:
URL: https://github.com/apache/arrow-rs/pull/9338#discussion_r2761189831
##########
arrow-string/src/substring.rs:
##########
@@ -360,24 +345,29 @@ fn fixed_size_binary_substring(
})
.for_each(|(start, end)|
new_values.extend_from_slice(&data[start..end]));
- let array_data = unsafe {
- ArrayData::new_unchecked(
- DataType::FixedSizeBinary(new_len),
- num_of_elements,
- None,
- array.nulls().map(|b| b.inner().sliced()),
- 0,
- vec![new_values.into()],
- vec![],
- )
+ let nulls = if new_len == 0 {
+ // FixedSizeBinaryArray::new takes length from the values buffer,
except when size == 0.
+ // In that case it uses the null buffer length, so preserve the
original length here.
+ // Example: ["", "", ""] -> substring(..., 1, Some(2)) should keep
len=3;
+ // otherwise it collapses to an empty array (len=0).
+ array
+ .nulls()
+ .cloned()
+ .or_else(|| Some(NullBuffer::new_valid(num_of_elements)))
Review Comment:
why does it need a Null buffer if the input didn't have a null buffer ?
##########
arrow-string/src/substring.rs:
##########
@@ -212,18 +211,12 @@ pub fn substring_by_char<OffsetSize: OffsetSizeTrait>(
}
new_offsets.append(OffsetSize::from_usize(vals.len()).unwrap());
});
- let data = unsafe {
- ArrayData::new_unchecked(
- GenericStringArray::<OffsetSize>::DATA_TYPE,
- array.len(),
- None,
- array.nulls().map(|b| b.inner().sliced()),
Review Comment:
likewise here we lost a sliced()
##########
arrow-string/src/regexp.rs:
##########
@@ -180,7 +180,6 @@ pub fn regexp_is_match_scalar<'a, S>(
where
&'a S: StringArrayType<'a>,
{
- let null_bit_buffer = array.nulls().map(|x| x.inner().sliced());
Review Comment:
The `sliced` call seems to have been lost 🤔
But all the tests pass
I am not sure how important it is to "materalize" the buffer like this , but
I recommend we keep it the same
So something like
```rust
let nulls = array.nulls().sliced().filter(|n| n.null_count() > 0);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]