scovich commented on issue #8609:
URL: https://github.com/apache/arrow-rs/issues/8609#issuecomment-3403173750
I think you've identified two separate issues:
> Probing a `VariantArray` when that element is null will result in a panic
This is arguably a bug that needs fixing, and the immediate fix is likely a
one-liner:
<details>
```diff
pub fn value(&self, index: usize) -> Variant<'_, '_> {
match (self.typed_value_field(), self.value_field()) {
// Always prefer typed_value, if available
(Some(typed_value), value) if typed_value.is_valid(index) => {
typed_value_to_variant(typed_value, value, index)
}
// Otherwise fall back to value, if available
- (_, Some(value)) if value.is_valid(index) => {
+ (_, Some(value)) if self.metadata.is_valid(index) &&
value.is_valid(index) => {
Variant::new(self.metadata.value(index), value.value(index))
}
// It is technically invalid for neither value nor typed_value
fields to be available,
// but the spec specifically requires readers to return
Variant::Null in this case.
_ => Variant::Null,
}
```
</details>
... but only because `VariantArray::typed_value_to_variant` doesn't support
partially shredded objects yet.
<details>
Once we support those, that helper will need to take the `metadata` column
as an additional arg, and the lifetime annotations will likely need to change
as well. Actually... I think the method _cannot_ support partially shredded
objects because `Variant` is fundamentally a reference type, and returning a
single `Variant` for a partially shredded object would require reassembling it.
Or adding the new `Variant::ShreddedObject` enum variant that @alamb has
mentioned a couple of times as a way to capture all the raw bytes as-is,
without trying to reassemble them.
If we always unshred all primitive types, then it _might_ be as simple as
adding new enum variants `ShreddedObject(ShreddedVariantObject)` and
`ShreddedList(Vec<Variant<'m, 'v>)` where:
```rust
struct ShreddedVariantObject<'m, 'v> {
unshredded_fields: Option<VariantObject<'m, 'v>>,
shredded_fields: Vec<(&'m str, Variant<'m, 'v>)>,
}
```
The hard part is _populating_ it, not defining it...
</details>
Meanwhile, there are some other panics lurking in that method, which we
should probably track fixes for as well?
> This seems like an odd pattern, as it's not clear exactly what a user
should do when iterating or probing through a `VariantArray`
For any arrow array, a caller who cares about nulls needs to check
`Array::is_null` or `Array::is_valid` before invoking the `value` method. A
blind `v.value(0)` is never recommended (but it still shouldn't panic).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]