scovich commented on code in PR #8122: URL: https://github.com/apache/arrow-rs/pull/8122#discussion_r2271520102
########## parquet-variant-compute/src/variant_array.rs: ########## @@ -229,107 +328,138 @@ pub enum ShreddingState { // TODO: add missing state where there is neither value nor typed_value // Missing { metadata: BinaryViewArray }, /// This variant has no typed_value field - Unshredded { - metadata: BinaryViewArray, - value: BinaryViewArray, - }, + Unshredded { value: BinaryViewArray }, /// This variant has a typed_value field and no value field /// meaning it is the shredded type - Typed { - metadata: BinaryViewArray, - typed_value: ArrayRef, - }, - /// Partially shredded: - /// * value is an object - /// * typed_value is a shredded object. + PerfectlyShredded { typed_value: ArrayRef }, Review Comment: Another wrinkle: The [shredding spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#objects) requires that the struct ("group") containing `value` and `typed_value` columns must be non-nullable: > The group for each named field must use repetition level `required`. A field's `value` and `typed_value` are set to null (missing) to indicate that the field does not exist in the variant. To encode a field that is present [not SQL NULL] with a [variant/JSON] `null` value, the `value` must contain a Variant null: basic type 0 (primitive) and physical type 0 (null). So when `variant_get` extracts a shredded object field, that field is physically non-nullable and we would need to compute a null mask for it as the union of all ancestors' null masks, plus a computed null mask for the field itself: Any `(value=NULL, typed_value=NULL)` pair produces a null in the null mask, while `(value=Variant::Null, typed_value=NULL)` does not (but casting it to a concrete type will produce SQL NULL even under the strictest casting semantics). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org