scovich commented on code in PR #8354:
URL: https://github.com/apache/arrow-rs/pull/8354#discussion_r2990140186
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -96,15 +171,33 @@ pub(crate) fn follow_shredded_path_element<'a>(
))
})?;
- let state = BorrowedShreddingState::try_from(struct_array)?;
+ let state = ShreddingState::try_from(struct_array)?;
Ok(ShreddedPathStep::Success(state))
}
- VariantPathElement::Index { .. } => {
- // TODO: Support array indexing. Among other things, it will
require slicing not
- // only the array we have here, but also the corresponding
metadata and null masks.
- Err(ArrowError::NotYetImplemented(
- "Pathing into shredded variant array index".into(),
- ))
+ VariantPathElement::Index { index } => {
+ let state = match typed_value.data_type() {
+ DataType::List(_) => take_list_like_index_as_shredding_state::<
+ GenericListArray<i32>,
+ >(typed_value.as_ref(), *index)?,
Review Comment:
Aside: It just makes my eyes hurt when `fmt` does stuff like this. But I
don't know a way to make it better, unless we want to drastically shorten the
function name ☹️
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -160,7 +253,7 @@ fn shredded_get_path(
// Peel away the prefix of path elements that traverses the shredded parts
of this variant
// column. Shredding will traverse the rest of the path on a per-row basis.
- let mut shredding_state = input.shredding_state().borrow();
+ let mut shredding_state = input.shredding_state().clone();
Review Comment:
This is ultimately cloning an `ArrayRef` (cheap) `BinaryViewArray` (not
expensive, but not cheap either -- has to go through `ArrayData`). Should we
change `ShreddingState::value` to `Option<Cow<'a, BinaryViewArray>>` to avoid
unnecessary cloning for struct-only paths? Then we could get rid of
`BorrowedShreddingState` entirely -- which BTW I think this PR currently leaves
as dead code (but no clippy warnings because it's `pub`)
##########
parquet-variant/src/path.rs:
##########
@@ -346,5 +346,14 @@ mod tests {
err.to_string(),
"Parser error: Invalid token in bracket request: `abc`. Expected a
quoted string or a number(e.g., `['field']` or `[123]`)"
);
+
+ // Out-of-range integer indexes are invalid path tokens.
+ let too_large_index = (usize::MAX as u128) + 1;
+ let err =
VariantPath::try_from(format!("[{too_large_index}]").as_str()).unwrap_err();
+ assert!(
+ err.to_string()
+ .contains("Parser error: Invalid token in bracket request"),
+ "{err}"
+ );
Review Comment:
Nice touch!
Do we have similar unit tests for verifying runtime behavior of a given
jsonpath when the underlying data doesn't "fit"? Or would that be part of the
larger fix?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]