alamb commented on code in PR #8365:
URL: https://github.com/apache/arrow-rs/pull/8365#discussion_r2361187972
##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -24,12 +24,54 @@ use arrow::datatypes::{
Float16Type, Float32Type, Float64Type, Int16Type, Int32Type, Int64Type,
Int8Type, UInt16Type,
UInt32Type, UInt64Type, UInt8Type,
};
+use arrow_schema::extension::ExtensionType;
use arrow_schema::{ArrowError, DataType, Field, FieldRef, Fields};
use parquet_variant::Uuid;
use parquet_variant::Variant;
use std::any::Any;
use std::sync::Arc;
+/// Variant Canonical Extension Type
+pub struct VariantType;
+
+impl ExtensionType for VariantType {
+ const NAME: &'static str = "parquet.variant";
+
+ // Variants have no extension metadata
+ type Metadata = ();
+
+ fn metadata(&self) -> &Self::Metadata {
+ &()
+ }
+
+ fn serialize_metadata(&self) -> Option<String> {
+ None
+ }
+
+ fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata,
ArrowError> {
+ Ok(())
+ }
+
+ fn supports_data_type(&self, data_type: &DataType) -> Result<(),
ArrowError> {
+ // Note don't check for metadata/value fields here because they may be
+ // absent in shredded variants
+ if matches!(data_type, DataType::Struct(_)) {
+ Ok(())
Review Comment:
> I don't see how cast_to_variant would help in that case?
I don't think it would (I misunderstood what you were getting at)
> But in that case, we have a (potentially deeply) nested struct with
multiple variant leaf fields, and fetching individual fields one at a time
would be pretty annoying.
> The difficulty comes if the caller just wants to get back whatever flavor
of variant is already there (without shredding or unshredding it first).
I am confused -- if we are parsing JSON, then it will always be variant (no
shredded), so I am not sure how there would be different flavors of variant.
Maybe this is a good thing for a call or some examples
Also I wonder if maybe we should open a new ticket as this conversation is
likely to get pretty lost
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]