sdf-jkl commented on code in PR #8354:
URL: https://github.com/apache/arrow-rs/pull/8354#discussion_r2995322497
##########
parquet-variant/src/path.rs:
##########
@@ -346,5 +346,14 @@ mod tests {
err.to_string(),
"Parser error: Invalid token in bracket request: `abc`. Expected a
quoted string or a number(e.g., `['field']` or `[123]`)"
);
+
+ // Out-of-range integer indexes are invalid path tokens.
+ let too_large_index = (usize::MAX as u128) + 1;
+ let err =
VariantPath::try_from(format!("[{too_large_index}]").as_str()).unwrap_err();
+ assert!(
+ err.to_string()
+ .contains("Parser error: Invalid token in bracket request"),
+ "{err}"
+ );
Review Comment:
We have these two:
-
https://github.com/apache/arrow-rs/pull/8354/changes#diff-9c960f865bc6facbaa5b8aa37bf6d918a2413799be4d7eaff5ddb421408b64feR1875
index out of bounds returns Null
-
https://github.com/apache/arrow-rs/pull/8354/changes#diff-9c960f865bc6facbaa5b8aa37bf6d918a2413799be4d7eaff5ddb421408b64feR2797
index on non-list returns Null
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -160,7 +253,7 @@ fn shredded_get_path(
// Peel away the prefix of path elements that traverses the shredded parts
of this variant
// column. Shredding will traverse the rest of the path on a per-row basis.
- let mut shredding_state = input.shredding_state().borrow();
+ let mut shredding_state = input.shredding_state().clone();
Review Comment:
`BorrowedShreddingState` is still used in unshred_variant.rs, but that's it:
https://github.com/apache/arrow-rs/blob/398962ec67bc777eca1c635c8ef01a9c634530eb/parquet-variant-compute/src/unshred_variant.rs#L172
--------
I tried using `Cow` before, but I was `Cow`ing the whole `ShreddingState`,
which was unnecessary.
I switched to `.clone` because value: `Option<BinaryViewArray>` is almost
zero copy:
```rust
pub type BinaryViewArray = GenericByteViewArray<BinaryViewType>;
// ...
pub struct GenericByteViewArray<T>
where
T: ByteViewType + ?Sized,
{
data_type: DataType, // 24 bytes
views: ScalarBuffer<u128>, // zero copy cloning
buffers: Arc<[Buffer]>, // zero copy cloning
phantom: PhantomData<T>, // zero sized
nulls: Option<NullBuffer>, // mostly zero copy cloning + 3usize
}
```
`Cow`ing just the `BinaryViewArray` should work.
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -96,15 +171,33 @@ pub(crate) fn follow_shredded_path_element<'a>(
))
})?;
- let state = BorrowedShreddingState::try_from(struct_array)?;
+ let state = ShreddingState::try_from(struct_array)?;
Ok(ShreddedPathStep::Success(state))
}
- VariantPathElement::Index { .. } => {
- // TODO: Support array indexing. Among other things, it will
require slicing not
- // only the array we have here, but also the corresponding
metadata and null masks.
- Err(ArrowError::NotYetImplemented(
- "Pathing into shredded variant array index".into(),
- ))
+ VariantPathElement::Index { index } => {
+ let state = match typed_value.data_type() {
+ DataType::List(_) => take_list_like_index_as_shredding_state::<
+ GenericListArray<i32>,
+ >(typed_value.as_ref(), *index)?,
Review Comment:
`take_list_like_index_state`?
Last one still won't fit 😢
```rust
VariantPathElement::Index { index } => {
let state = match typed_value.data_type() {
DataType::List(_) =>
take_list_like_index_state::<GenericListArray<i32>>(
typed_value.as_ref(),
*index,
)?,
DataType::LargeList(_) =>
take_list_like_index_state::<GenericListArray<i64>>(
typed_value.as_ref(),
*index,
)?,
DataType::ListView(_) =>
take_list_like_index_state::<GenericListViewArray<i32>>(
typed_value.as_ref(),
*index,
)?,
DataType::LargeListView(_) => take_list_like_index_state::<
GenericListViewArray<i64>,
>(typed_value.as_ref(), *index)?,
_ => {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]