Re: [PR] [Variant] Proposal for `variant_get` and Shredded `VariantArray` [arrow-rs]

via GitHub Wed, 06 Aug 2025 05:56:43 -0700


scovich commented on code in PR #8021:
URL: https://github.com/apache/arrow-rs/pull/8021#discussion_r2257063486



##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -135,36 +140,189 @@ impl VariantArray {
         self.inner
     }
 
+    /// Return the shredding state of this `VariantArray`
+    pub fn shredding_state(&self) -> &ShreddingState {
+        &self.shredding_state
+    }
+
     /// Return the [`Variant`] instance stored at the given row
     ///
-    /// Panics if the index is out of bounds.
+    /// Consistently with other Arrow arrays types, this API requires you to
+    /// check for nulls first using [`Self::is_valid`].
+    ///
+    /// # Panics
+    /// * if the index is out of bounds
+    /// * if the array value is null
+    ///
+    /// If this is a shredded variant but has no value at the shredded 
location, it
+    /// will return [`Variant::Null`].
+    ///
+    ///
+    /// # Performance Note
+    ///
+    /// This is certainly not the most efficient way to access values in a
+    /// `VariantArray`, but it is useful for testing and debugging.
     ///
     /// Note: Does not do deep validation of the [`Variant`], so it is up to 
the
     /// caller to ensure that the metadata and value were constructed 
correctly.
     pub fn value(&self, index: usize) -> Variant {
-        let metadata = self.metadata_field().as_binary_view().value(index);
-        let value = self.value_field().as_binary_view().value(index);
-        Variant::new(metadata, value)
+        match &self.shredding_state {
+            ShreddingState::Unshredded { metadata, value } => {
+                Variant::new(metadata.value(index), value.value(index))
+            }
+            ShreddingState::FullyShredded {
+                metadata: _,
+                typed_value,
+            } => {
+                if typed_value.is_null(index) {
+                    Variant::Null
+                } else {
+                    typed_value_to_variant(typed_value, index)
+                }
+            }
+            ShreddingState::PartiallyShredded {
+                metadata,
+                value,
+                typed_value,
+            } => {
+                if typed_value.is_null(index) {
+                    Variant::new(metadata.value(index), value.value(index))
+                } else {
+                    typed_value_to_variant(typed_value, index)
+                }
+            }
+        }

Review Comment:
   > The reason I first introduced the enum was that in `variant_get` I needed 
to dispatch to different methods based on the type of shredding, and I found it 
myself really wanting to have names for them rather than just the combinations 
of value and typed value.
   
   I guess... but it seems like there's so much code just managing the 
resulting enum variants I question whether it's a net win. Like the getters for 
metadata, value, etc. or the above code that shrinks from 25 lines to seven 
while handling more cases. 
   
   Clearly either way can be made correct, tho, so I guess it's a matter of 
preference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Variant] Proposal for `variant_get` and Shredded `VariantArray` [arrow-rs]

Reply via email to