scovich commented on code in PR #8093:
URL: https://github.com/apache/arrow-rs/pull/8093#discussion_r2271384578


##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -434,15 +439,17 @@ mod test {
     }
 
     #[test]
-    fn invalid_missing_value() {
+    fn all_null_missing_value_and_typed_value() {
         let fields = Fields::from(vec![Field::new("metadata", 
DataType::BinaryView, false)]);
         let array = StructArray::new(fields, vec![make_binary_view_array()], 
None);
-        // Should fail because the StructArray does not contain a 'value' field
-        let err = VariantArray::try_new(Arc::new(array));
-        assert_eq!(
-            err.unwrap_err().to_string(),
-            "Invalid argument error: VariantArray has neither value nor 
typed_value field"
-        );
+        // Should succeed and create an AllNull variant when neither value nor 
typed_value are present
+        let variant_array = VariantArray::try_new(Arc::new(array)).unwrap();

Review Comment:
   By a strict reading of the spec, this should actually fail, because this 
`ShreddingState` does not represent a shredded object field, but rather 
represents a top-level variant value:
   > value |    typed_value |   Meaning
   > -|-|-
   > null |     null |  The value is missing; only valid for shredded object 
fields
   
   But maybe that's a validation `VariantArray::try_new` should perform, not 
`ShreddingState::try_new`?
   
   Also, it would quickly become annoying if `variant_get` has to replace a 
missing or all-null `value` column with an `all-Variant::Null` column just to 
comply with the spec? Maybe that's why there's this additional tidbit?
   > If a Variant is missing in a context where a value is required, readers 
must return a Variant null (`00`): basic type 0 (primitive) and physical type 0 
(null). For example, if a Variant is required (like `measurement` above) and 
both `value` and `typed_value` are null, the returned value must be `00` 
(Variant null).



##########
parquet-variant-compute/src/variant_get/output/primitive.rs:
##########
@@ -155,6 +156,19 @@ impl<'a, T: ArrowPrimitiveVariant> OutputBuilder for 
PrimitiveOutputBuilder<'a,
             "variant_get unshredded to primitive types is not implemented yet",
         )))
     }
+
+    fn all_null(
+        &self,
+        variant_array: &VariantArray,
+        _metadata: &BinaryViewArray,
+    ) -> Result<ArrayRef> {
+        // For all-null case, create a primitive array with all null values
+        let nulls = NullBuffer::from(vec![false; variant_array.len()]);
+        let values = vec![T::default_value(); variant_array.len()];
+        let array = PrimitiveArray::<T>::new(values.into(), Some(nulls))
+            .with_data_type(self.as_type.data_type().clone());
+        Ok(Arc::new(array))

Review Comment:
   ```suggestion
           Ok(Arc::new(new_null_array(self.as_type.data_type(), 
variant_array.len())))
   ```



##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -187,6 +187,7 @@ impl VariantArray {
                     typed_value_to_variant(typed_value, index)
                 }
             }
+            ShreddingState::AllNull { .. } => Variant::Null,

Review Comment:
   This is tricky... see 
https://github.com/apache/arrow-rs/pull/8122#discussion_r2271263385
   * For a top-level variant, null/null is illegal (tho returning 
`Variant::Null` is arguably a correct way to compensate)
   * For a shredded variant field, null/null means SQL NULL, and returning 
`Variant::Null` is arguably incorrect (causes SQL `IS NULL` operator to return 
`FALSE`). But we don't even _have_ a way to return SQL NULL here (it would 
probably correspond to `Option::None`?)



##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -488,4 +495,50 @@ mod test {
     fn make_binary_array() -> ArrayRef {
         Arc::new(BinaryArray::from(vec![b"test" as &[u8]]))
     }
+
+    #[test]
+    fn all_null_shredding_state() {
+        let metadata = BinaryViewArray::from(vec![b"test" as &[u8]]);
+        let shredding_state = ShreddingState::try_new(metadata.clone(), None, 
None).unwrap();
+
+        assert!(matches!(shredding_state, ShreddingState::AllNull { .. }));
+
+        // Verify metadata is preserved correctly
+        if let ShreddingState::AllNull { metadata: m } = shredding_state {
+            assert_eq!(m.len(), metadata.len());
+            assert_eq!(m.value(0), metadata.value(0));
+        }
+    }
+
+    #[test]
+    fn all_null_variant_array_construction() {
+        let metadata = BinaryViewArray::from(vec![b"test" as &[u8]; 3]);
+        let nulls = NullBuffer::from(vec![false, false, false]); // all null

Review Comment:
   Ah -- the 
[spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#objects)
 requires that the struct ("group") containing `value` and `typed_value` 
columns must be non-nullable:
   > The group for each named field must use repetition level `required`. A 
field's `value` and `typed_value` are set to null (missing) to indicate that 
the field does not exist in the variant. To encode a field that is present with 
a [variant/JSON, not SQL] `null` value, the `value` must contain a Variant 
null: basic type 0 (primitive) and physical type 0 (null).
   
   So effectively, the NULL/NULL combo becomes the null mask for that nested 
field. Which is why a top-level NULL/NULL combo is incorrect -- the top-level 
field already has its own nullability.



##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -488,4 +495,50 @@ mod test {
     fn make_binary_array() -> ArrayRef {
         Arc::new(BinaryArray::from(vec![b"test" as &[u8]]))
     }
+
+    #[test]
+    fn all_null_shredding_state() {
+        let metadata = BinaryViewArray::from(vec![b"test" as &[u8]]);
+        let shredding_state = ShreddingState::try_new(metadata.clone(), None, 
None).unwrap();
+
+        assert!(matches!(shredding_state, ShreddingState::AllNull { .. }));
+
+        // Verify metadata is preserved correctly
+        if let ShreddingState::AllNull { metadata: m } = shredding_state {
+            assert_eq!(m.len(), metadata.len());
+            assert_eq!(m.value(0), metadata.value(0));
+        }
+    }
+
+    #[test]
+    fn all_null_variant_array_construction() {
+        let metadata = BinaryViewArray::from(vec![b"test" as &[u8]; 3]);
+        let nulls = NullBuffer::from(vec![false, false, false]); // all null

Review Comment:
   From what I understand, the treatment of NULL/NULL depends on context:
   * For a top-level variant value, it's interpreted as `Variant::Null`
   * For a shredded variant object field, it's interpreted as missing (SQL NULL)
   
   So I guess there are two ways to get SQL NULL -- null mask on the 
struct(value, typed_value), or if both value and typed_value are themselves 
NULL?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to