scovich commented on code in PR #8122:
URL: https://github.com/apache/arrow-rs/pull/8122#discussion_r2271520102


##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -229,107 +328,138 @@ pub enum ShreddingState {
     // TODO: add missing state where there is neither value nor typed_value
     // Missing { metadata: BinaryViewArray },
     /// This variant has no typed_value field
-    Unshredded {
-        metadata: BinaryViewArray,
-        value: BinaryViewArray,
-    },
+    Unshredded { value: BinaryViewArray },
     /// This variant has a typed_value field and no value field
     /// meaning it is the shredded type
-    Typed {
-        metadata: BinaryViewArray,
-        typed_value: ArrayRef,
-    },
-    /// Partially shredded:
-    /// * value is an object
-    /// * typed_value is a shredded object.
+    PerfectlyShredded { typed_value: ArrayRef },

Review Comment:
   Another wrinkle: The [shredding 
spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#objects)
 requires that the struct ("group") containing `value` and `typed_value` 
columns must be non-nullable:
   > The group for each named field must use repetition level `required`. A 
field's `value` and `typed_value` are set to null (missing) to indicate that 
the field does not exist in the variant. To encode a field that is present [not 
SQL NULL] with a [variant/JSON] `null` value, the `value` must contain a 
Variant null: basic type 0 (primitive) and physical type 0 (null).
   
   So when `variant_get` extracts a shredded object field, that field is 
physically non-nullable and we would need to compute a null mask for it as the 
union of all ancestors' null masks, plus a computed null mask for the field 
itself: Any `(value=NULL, typed_value=NULL)` pair produces a null in the null 
mask, while `(value=Variant::Null, typed_value=NULL)` does not (but casting it 
to a concrete type will produce SQL NULL even under the strictest casting 
semantics).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to