sdf-jkl commented on code in PR #8354:
URL: https://github.com/apache/arrow-rs/pull/8354#discussion_r2995322497


##########
parquet-variant/src/path.rs:
##########
@@ -346,5 +346,14 @@ mod tests {
             err.to_string(),
             "Parser error: Invalid token in bracket request: `abc`. Expected a 
quoted string or a number(e.g., `['field']` or `[123]`)"
         );
+
+        // Out-of-range integer indexes are invalid path tokens.
+        let too_large_index = (usize::MAX as u128) + 1;
+        let err = 
VariantPath::try_from(format!("[{too_large_index}]").as_str()).unwrap_err();
+        assert!(
+            err.to_string()
+                .contains("Parser error: Invalid token in bracket request"),
+            "{err}"
+        );

Review Comment:
   We have these two:
   - 
https://github.com/apache/arrow-rs/pull/8354/changes#diff-9c960f865bc6facbaa5b8aa37bf6d918a2413799be4d7eaff5ddb421408b64feR1875
 index out of bounds returns Null
   - 
https://github.com/apache/arrow-rs/pull/8354/changes#diff-9c960f865bc6facbaa5b8aa37bf6d918a2413799be4d7eaff5ddb421408b64feR2797
 index on non-list returns Null
   



##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -160,7 +253,7 @@ fn shredded_get_path(
 
     // Peel away the prefix of path elements that traverses the shredded parts 
of this variant
     // column. Shredding will traverse the rest of the path on a per-row basis.
-    let mut shredding_state = input.shredding_state().borrow();
+    let mut shredding_state = input.shredding_state().clone();

Review Comment:
   `BorrowedShreddingState` is still used in unshred_variant.rs, but that's it:
   
https://github.com/apache/arrow-rs/blob/398962ec67bc777eca1c635c8ef01a9c634530eb/parquet-variant-compute/src/unshred_variant.rs#L172
   
   --------
   I tried using `Cow` before, but I was `Cow`ing the whole `ShreddingState`, 
which was unnecessary.
   
   I switched to `.clone` because value: `Option<BinaryViewArray>` is almost 
zero copy:
   ```rust
   pub type BinaryViewArray = GenericByteViewArray<BinaryViewType>;
   // ...
   pub struct GenericByteViewArray<T>
   where
       T: ByteViewType + ?Sized,
   {
       data_type: DataType, // 24 bytes
       views: ScalarBuffer<u128>, // zero copy cloning
       buffers: Arc<[Buffer]>, // zero copy cloning
       phantom: PhantomData<T>,  // zero sized
       nulls: Option<NullBuffer>,  // mostly zero copy cloning + 3usize
   }
   ```
   
   `Cow`ing just the `BinaryViewArray` should work.



##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -96,15 +171,33 @@ pub(crate) fn follow_shredded_path_element<'a>(
                 ))
             })?;
 
-            let state = BorrowedShreddingState::try_from(struct_array)?;
+            let state = ShreddingState::try_from(struct_array)?;
             Ok(ShreddedPathStep::Success(state))
         }
-        VariantPathElement::Index { .. } => {
-            // TODO: Support array indexing. Among other things, it will 
require slicing not
-            // only the array we have here, but also the corresponding 
metadata and null masks.
-            Err(ArrowError::NotYetImplemented(
-                "Pathing into shredded variant array index".into(),
-            ))
+        VariantPathElement::Index { index } => {
+            let state = match typed_value.data_type() {
+                DataType::List(_) => take_list_like_index_as_shredding_state::<
+                    GenericListArray<i32>,
+                >(typed_value.as_ref(), *index)?,

Review Comment:
   `take_list_like_index_state`?
   
   Last one still won't fit 😢 
   ```rust
           VariantPathElement::Index { index } => {
               let state = match typed_value.data_type() {
                   DataType::List(_) => 
take_list_like_index_state::<GenericListArray<i32>>(
                       typed_value.as_ref(),
                       *index,
                   )?,
                   DataType::LargeList(_) => 
take_list_like_index_state::<GenericListArray<i64>>(
                       typed_value.as_ref(),
                       *index,
                   )?,
                   DataType::ListView(_) => 
take_list_like_index_state::<GenericListViewArray<i32>>(
                       typed_value.as_ref(),
                       *index,
                   )?,
                   DataType::LargeListView(_) => take_list_like_index_state::<
                       GenericListViewArray<i64>,
                   >(typed_value.as_ref(), *index)?,
                   _ => {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to