scovich commented on code in PR #8021:
URL: https://github.com/apache/arrow-rs/pull/8021#discussion_r2257063486
##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -135,36 +140,189 @@ impl VariantArray {
self.inner
}
+ /// Return the shredding state of this `VariantArray`
+ pub fn shredding_state(&self) -> &ShreddingState {
+ &self.shredding_state
+ }
+
/// Return the [`Variant`] instance stored at the given row
///
- /// Panics if the index is out of bounds.
+ /// Consistently with other Arrow arrays types, this API requires you to
+ /// check for nulls first using [`Self::is_valid`].
+ ///
+ /// # Panics
+ /// * if the index is out of bounds
+ /// * if the array value is null
+ ///
+ /// If this is a shredded variant but has no value at the shredded
location, it
+ /// will return [`Variant::Null`].
+ ///
+ ///
+ /// # Performance Note
+ ///
+ /// This is certainly not the most efficient way to access values in a
+ /// `VariantArray`, but it is useful for testing and debugging.
///
/// Note: Does not do deep validation of the [`Variant`], so it is up to
the
/// caller to ensure that the metadata and value were constructed
correctly.
pub fn value(&self, index: usize) -> Variant {
- let metadata = self.metadata_field().as_binary_view().value(index);
- let value = self.value_field().as_binary_view().value(index);
- Variant::new(metadata, value)
+ match &self.shredding_state {
+ ShreddingState::Unshredded { metadata, value } => {
+ Variant::new(metadata.value(index), value.value(index))
+ }
+ ShreddingState::FullyShredded {
+ metadata: _,
+ typed_value,
+ } => {
+ if typed_value.is_null(index) {
+ Variant::Null
+ } else {
+ typed_value_to_variant(typed_value, index)
+ }
+ }
+ ShreddingState::PartiallyShredded {
+ metadata,
+ value,
+ typed_value,
+ } => {
+ if typed_value.is_null(index) {
+ Variant::new(metadata.value(index), value.value(index))
+ } else {
+ typed_value_to_variant(typed_value, index)
+ }
+ }
+ }
Review Comment:
> The reason I first introduced the enum was that in `variant_get` I needed
to dispatch to different methods based on the type of shredding, and I found it
myself really wanting to have names for them rather than just the combinations
of value and typed value.
I guess... but it seems like there's so much code just managing the
resulting enum variants I question whether it's a net win. Like the getters for
metadata, value, etc. or the above code that shrinks from 25 lines to seven
while handling more cases.
Clearly either way can be made correct, tho, so I guess it's a matter of
preference.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]