klion26 commented on code in PR #8354:
URL: https://github.com/apache/arrow-rs/pull/8354#discussion_r2609654290
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -97,14 +98,67 @@ pub(crate) fn follow_shredded_path_element<'a>(
})?;
let state = BorrowedShreddingState::try_from(struct_array)?;
- Ok(ShreddedPathStep::Success(state))
+ Ok(ShreddedPathStep::Success(state.into()))
}
- VariantPathElement::Index { .. } => {
+ VariantPathElement::Index { index } => {
// TODO: Support array indexing. Among other things, it will
require slicing not
// only the array we have here, but also the corresponding
metadata and null masks.
- Err(ArrowError::NotYetImplemented(
- "Pathing into shredded variant array index".into(),
- ))
+ let Some(list_array) =
typed_value.as_any().downcast_ref::<GenericListArray<i32>>()
+ else {
+ // Downcast failure - if strict cast options are enabled, this
should be an error
+ if !cast_options.safe {
+ return Err(ArrowError::CastError(format!(
+ "Cannot access index '{}' on non-list type: {}",
+ index,
+ typed_value.data_type()
+ )));
+ }
+ // With safe cast options, return NULL (missing_path_step)
+ return Ok(missing_path_step());
+ };
+
+ let offsets = list_array.offsets();
+ let values = list_array.values(); // This is a StructArray
+
+ let Some(struct_array) =
values.as_any().downcast_ref::<StructArray>() else {
+ return Ok(missing_path_step());
+ };
+
+ let Some(typed_array) = struct_array.column_by_name("typed_value")
else {
+ return Ok(missing_path_step());
+ };
+
+ // Build the list of indices to take
+ let mut take_indices = Vec::with_capacity(list_array.len());
+ for i in 0..list_array.len() {
+ let start = offsets[i] as usize;
+ let end = offsets[i + 1] as usize;
+ let len = end - start;
+
+ if *index < len {
Review Comment:
Sorry for not describing it clearly.
The data`["comedy", "drama"], ["horro", 123]` translated into variant will
be that
- `comedy`, `drama` and `horro` in the `typed_value` column,
- and `123` in the `value` column(it has an incompatible type).
Here, we retrieve all the results from the `typed_value` column(`take` in
line 148), but `["hooro", 123](1)`(the second item in the list) here will
return null(if `CastOptions::safe = true`) and `Err` (if `CastOptions::safe =
false`) -- currently, we return null for both of the cases.
Seems there may be something more tricky here(maybe we need to have a design
note for this as this
[comment](https://github.com/apache/arrow-rs/issues/8082#issuecomment-3258435567)),
such as
- if the target_type here we request is not `list`/`struct` then, we can use
the logic like here, and respect the `CastOptions::safe`
- If we need to handle [variant
nesting](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#nesting)
here or somewhere else?
- Here, I don't have any answer yet. I'll try to find some time next
week for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]