alamb commented on code in PR #8365:
URL: https://github.com/apache/arrow-rs/pull/8365#discussion_r2360625046
##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -24,12 +24,54 @@ use arrow::datatypes::{
Float16Type, Float32Type, Float64Type, Int16Type, Int32Type, Int64Type,
Int8Type, UInt16Type,
UInt32Type, UInt64Type, UInt8Type,
};
+use arrow_schema::extension::ExtensionType;
use arrow_schema::{ArrowError, DataType, Field, FieldRef, Fields};
use parquet_variant::Uuid;
use parquet_variant::Variant;
use std::any::Any;
use std::sync::Arc;
+/// Variant Canonical Extension Type
+pub struct VariantType;
+
+impl ExtensionType for VariantType {
+ const NAME: &'static str = "parquet.variant";
+
+ // Variants have no extension metadata
+ type Metadata = ();
+
+ fn metadata(&self) -> &Self::Metadata {
+ &()
+ }
+
+ fn serialize_metadata(&self) -> Option<String> {
+ None
+ }
+
+ fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata,
ArrowError> {
+ Ok(())
+ }
+
+ fn supports_data_type(&self, data_type: &DataType) -> Result<(),
ArrowError> {
+ // Note don't check for metadata/value fields here because they may be
+ // absent in shredded variants
+ if matches!(data_type, DataType::Struct(_)) {
+ Ok(())
Review Comment:
> Once the field's path is exhausted, just return whatever was there
(similar to how as_type=None is implemented today); if we passed an explicit
shredding schema instead, then the implementation would have to examine every
node of the schema, all the way to leaf level, in case the provided shredding
structure doesn't actually match the underlying.
This is sort of what I expected to happen today.
Maybe we could use the `cast_to_variant` kernel for this? For example, if
I had a VariantArray with some arbitrary shredding, I could do:
```rust
let array: VariantArray = ...; // input somehow
// get the "foo" field (may be shredded).
let foo = variant_get(array, Path::from("foo"))?
// ensure foo is non shredded
let unshredded_foo = cast_to_variant(foo)?;
```
🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]