scovich commented on code in PR #8122: URL: https://github.com/apache/arrow-rs/pull/8122#discussion_r2274444169
########## parquet-variant-compute/src/variant_array.rs: ########## @@ -192,7 +202,96 @@ impl VariantArray { /// Return a reference to the metadata field of the [`StructArray`] pub fn metadata_field(&self) -> &BinaryViewArray { - self.shredding_state.metadata_field() + &self.metadata + } + + /// Return a reference to the value field of the `StructArray` + pub fn value_field(&self) -> Option<&BinaryViewArray> { + self.shredding_state.value_field() + } + + /// Return a reference to the typed_value field of the `StructArray`, if present + pub fn typed_value_field(&self) -> Option<&ArrayRef> { + self.shredding_state.typed_value_field() + } +} + +/// One shredded field of a partially or prefectly shredded variant. For example, suppose the +/// shredding schema for variant `v` treats it as an object with a single field `a`, where `a` is +/// itself a struct with the single field `b` of type INT. Then the physical layout of the column +/// is: +/// +/// ```text +/// v: VARIANT { +/// metadata: BINARY, +/// value: BINARY, +/// typed_value: STRUCT { +/// a: SHREDDED_VARIANT_FIELD { +/// value: BINARY, +/// typed_value: STRUCT { +/// a: SHREDDED_VARIANT_FIELD { +/// value: BINARY, +/// typed_value: INT, +/// }, +/// }, +/// }, +/// }, +/// } +/// ``` +/// +/// In the above, each row of `v.value` is either a variant value (shredding failed, `v` was not an +/// object at all) or a variant object (partial shredding, `v` was an object but included unexpected +/// fields other than `a`), or is NULL (perfect shredding, `v` was an object containing only the +/// single expected field `a`). +/// +/// A similar story unfolds for each `v.typed_value.a.value` -- a variant value if shredding failed +/// (`v:a` was not an object at all), or a variant object (`v:a` was an object with unexpected +/// additional fields), or NULL (`v:a` was an object containing only the single expected field `b`). +/// +/// Finally, `v.typed_value.a.typed_value.b.value` is either NULL (`v:a.b` was an integer) or else a +/// variant value. +pub struct ShreddedVariantFieldArray { Review Comment: From @alamb -- > If we could somehow figure out to make this be VariantArray rather than ShreddedVariantFieldArray I think that would be the most elegant / understandable That was my initial reaction as well. But the `metadata` field becomes problematic. Do we make it optional? Require it anyway and risk writing out invalid parquet if somebody forgets to strip it out or ignore it? etc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org