scovich commented on code in PR #8122:
URL: https://github.com/apache/arrow-rs/pull/8122#discussion_r2274444169


##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -192,7 +202,96 @@ impl VariantArray {
 
     /// Return a reference to the metadata field of the [`StructArray`]
     pub fn metadata_field(&self) -> &BinaryViewArray {
-        self.shredding_state.metadata_field()
+        &self.metadata
+    }
+
+    /// Return a reference to the value field of the `StructArray`
+    pub fn value_field(&self) -> Option<&BinaryViewArray> {
+        self.shredding_state.value_field()
+    }
+
+    /// Return a reference to the typed_value field of the `StructArray`, if 
present
+    pub fn typed_value_field(&self) -> Option<&ArrayRef> {
+        self.shredding_state.typed_value_field()
+    }
+}
+
+/// One shredded field of a partially or prefectly shredded variant. For 
example, suppose the
+/// shredding schema for variant `v` treats it as an object with a single 
field `a`, where `a` is
+/// itself a struct with the single field `b` of type INT. Then the physical 
layout of the column
+/// is:
+///
+/// ```text
+/// v: VARIANT {
+///     metadata: BINARY,
+///     value: BINARY,
+///     typed_value: STRUCT {
+///         a: SHREDDED_VARIANT_FIELD {
+///             value: BINARY,
+///             typed_value: STRUCT {
+///                 a: SHREDDED_VARIANT_FIELD {
+///                     value: BINARY,
+///                     typed_value: INT,
+///                 },
+///             },
+///         },
+///     },
+/// }
+/// ```
+///
+/// In the above, each row of `v.value` is either a variant value (shredding 
failed, `v` was not an
+/// object at all) or a variant object (partial shredding, `v` was an object 
but included unexpected
+/// fields other than `a`), or is NULL (perfect shredding, `v` was an object 
containing only the
+/// single expected field `a`).
+///
+/// A similar story unfolds for each `v.typed_value.a.value` -- a variant 
value if shredding failed
+/// (`v:a` was not an object at all), or a variant object (`v:a` was an object 
with unexpected
+/// additional fields), or NULL (`v:a` was an object containing only the 
single expected field `b`).
+///
+/// Finally, `v.typed_value.a.typed_value.b.value` is either NULL (`v:a.b` was 
an integer) or else a
+/// variant value.
+pub struct ShreddedVariantFieldArray {

Review Comment:
   From @alamb -- 
   > If we could somehow figure out to make this be VariantArray rather than 
ShreddedVariantFieldArray I think that would be the most elegant / 
understandable
   
   That was my initial reaction as well. But the `metadata` field becomes 
problematic. Do we make it optional? Require it anyway and risk writing out 
invalid parquet if somebody forgets to strip it out or ignore it? etc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to