scovich commented on code in PR #8365:
URL: https://github.com/apache/arrow-rs/pull/8365#discussion_r2356687243


##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -24,12 +24,54 @@ use arrow::datatypes::{
     Float16Type, Float32Type, Float64Type, Int16Type, Int32Type, Int64Type, 
Int8Type, UInt16Type,
     UInt32Type, UInt64Type, UInt8Type,
 };
+use arrow_schema::extension::ExtensionType;
 use arrow_schema::{ArrowError, DataType, Field, FieldRef, Fields};
 use parquet_variant::Uuid;
 use parquet_variant::Variant;
 use std::any::Any;
 use std::sync::Arc;
 
+/// Variant Canonical Extension Type
+pub struct VariantType;
+
+impl ExtensionType for VariantType {
+    const NAME: &'static str = "parquet.variant";
+
+    // Variants have no extension metadata
+    type Metadata = ();
+
+    fn metadata(&self) -> &Self::Metadata {
+        &()
+    }
+
+    fn serialize_metadata(&self) -> Option<String> {
+        None
+    }
+
+    fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata, 
ArrowError> {
+        Ok(())
+    }
+
+    fn supports_data_type(&self, data_type: &DataType) -> Result<(), 
ArrowError> {
+        // Note don't check for metadata/value fields here because they may be
+        // absent in shredded variants
+        if matches!(data_type, DataType::Struct(_)) {
+            Ok(())

Review Comment:
   There are a few cases:
   * Pass `as_type=None` -- this would preserve existing shredding structure, 
if any
   * Pass `as_type=Some(... VariantType...)` -- what schema should the 
underlying struct have, to:
      * Force shredding as a specific schema
      * Force unshredding to binary
      * Return whatever was there (equivalent to passing None)
   * Pass `as_type=Some(StructType(... VariantType ...))` -- this requests to 
shred the variant value as a struct, but that the field(s) inside that struct 
are variant-typed (not shredded). The same three questions apply, but now 
`as_type=None` is not a fallback
      * We had talked previously about a providing function that could return 
the shredding schema for a given path, so the user can stub in the 
variant-typed fields however they want. But that's pretty fiddly.
      * The alternative (which I proposed above) was to allow annotating the 
variant type to indicate that `variant_get` should preserve the underlying 
shredding structure, without having to actually specify it. 
      * Much cleaner, and actually simpler for the implementation as well. Once 
the field's path is exhausted, just return whatever was there (similar to how 
`as_type=None` is implemented today); if we passed an explicit shredding schema 
instead, then the implementation would have to examine every node of the 
schema, all the way to leaf level, in case the provided shredding structure 
doesn't actually match the underlying.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to