alamb commented on code in PR #7452:
URL: https://github.com/apache/arrow-rs/pull/7452#discussion_r2080303787


##########
arrow-variant/src/variant.rs:
##########
@@ -0,0 +1,418 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Core Variant data type for working with the Arrow Variant binary format.
+
+use crate::decoder;
+use arrow_schema::ArrowError;
+use std::fmt;
+
+/// A Variant value in the Arrow binary format
+#[derive(Debug, Clone, PartialEq)]
+pub struct Variant<'a> {

Review Comment:
   This is a good start as it does not copy the values. However I think it may 
have a few issues:
   1. The lifetimes (`'a`) are the same for the Value and the Metadata which I 
think will make sharing metadata across multiple variants potentially tricky
   2. There is no way to use `match` effectively to switch on variant type. 
Instead, I think it needs methods like `is_object` or `is_array` 
   
   As written I think these structures follow a pattern that is more common on 
Java or C++ (which is fine, but if we are going to make a native Rust library I 
think it is worth following standard rust Idioms)
   
   I wonder if you considered the structure contemplated here: 
https://github.com/apache/arrow-rs/issues/7423
   
   Specifically this structure:
   
   ```rust
   /// Variant value. May contain references to metadata and value
   /// 'a is lifetime for metadata
   /// 'b is lifetime for value
   pub enum Variant<'a, 'b> {
     Variant::Null,
     Variant::Int8
     ...
     // strings are stored in the value and thus have references to that value
     Variant::String(&'b str),
     Variant::ShortString(&'b str),
     // Objects and Arrays need the metadata and values, so store both.
     Variant::Object(VariantObject<'a, 'b>),
     VariantArray(VariantArray<'a, 'b>)
   }
   
   /// Wrapper over Variant Metadata
   pub struct VariantMetadata<'a> {
     metadata: &'a[u8],
     // perhaps access to header fields like dict length and is_sorted
   }
   
   /// Represents a Variant Object with references to the underlying metadata
   /// and value fields
   pub enum VariantObject<'a, 'b> {
     // pointer to metadata
     metadata: VariantMetadata<'a>,
     // pointer to value
     value: &'a [u8],
   }
   ```
   



##########
arrow-variant/src/variant.rs:
##########
@@ -0,0 +1,418 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Core Variant data type for working with the Arrow Variant binary format.
+
+use crate::decoder;
+use arrow_schema::ArrowError;
+use std::fmt;
+
+/// A Variant value in the Arrow binary format

Review Comment:
   Technically Variant is part of the Parquet spec, not part of the Arrow spec 
🤷 
   
   ```suggestion
   /// A Variant value in the Parquet binary format
   ```



##########
arrow-variant/src/variant.rs:
##########
@@ -0,0 +1,418 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Core Variant data type for working with the Arrow Variant binary format.
+
+use crate::decoder;
+use arrow_schema::ArrowError;
+use std::fmt;
+
+/// A Variant value in the Arrow binary format
+#[derive(Debug, Clone, PartialEq)]
+pub struct Variant<'a> {
+    /// Raw metadata bytes
+    metadata: &'a [u8],
+    /// Raw value bytes
+    value: &'a [u8],
+}
+
+impl<'a> Variant<'a> {
+    /// Creates a new Variant with metadata and value bytes
+    pub fn new(metadata: &'a [u8], value: &'a [u8]) -> Self {
+        Self { metadata, value }
+    }
+
+    /// Creates a Variant by parsing binary metadata and value
+    pub fn try_new(metadata: &'a [u8], value: &'a [u8]) -> Result<Self, 
ArrowError> {
+        // Validate that the binary data is a valid Variant
+        decoder::validate_variant(value, metadata)?;
+
+        Ok(Self { metadata, value })
+    }
+
+    /// Returns the raw metadata bytes
+    pub fn metadata(&self) -> &'a [u8] {
+        self.metadata
+    }
+
+    /// Returns the raw value bytes
+    pub fn value(&self) -> &'a [u8] {
+        self.value
+    }
+
+    /// Gets a value by key from an object Variant
+    ///
+    /// Returns:
+    /// - `Ok(Some(Variant))` if the key exists
+    /// - `Ok(None)` if the key doesn't exist or the Variant is not an object
+    /// - `Err` if there was an error parsing the Variant
+    pub fn get(&self, key: &str) -> Result<Option<Variant<'a>>, ArrowError> {
+        let result = decoder::get_field_value_range(self.value, self.metadata, 
key)?;
+        Ok(result.map(|(start, end)| Variant {
+            metadata: self.metadata,        // Share the same metadata 
reference
+            value: &self.value[start..end], // Use a slice of the original 
value buffer
+        }))
+    }
+
+    /// Gets a value by index from an array Variant
+    ///
+    /// Returns:
+    /// - `Ok(Some(Variant))` if the index is valid
+    /// - `Ok(None)` if the index is out of bounds or the Variant is not an 
array
+    /// - `Err` if there was an error parsing the Variant
+    pub fn get_index(&self, index: usize) -> Result<Option<Variant<'a>>, 
ArrowError> {
+        let result = decoder::get_array_element_range(self.value, index)?;
+        Ok(result.map(|(start, end)| Variant {
+            metadata: self.metadata,        // Share the same metadata 
reference
+            value: &self.value[start..end], // Use a slice of the original 
value buffer
+        }))
+    }
+
+    /// Checks if this Variant is an object
+    pub fn is_object(&self) -> Result<bool, ArrowError> {
+        decoder::is_object(self.value)
+    }
+
+    /// Checks if this Variant is an array
+    pub fn is_array(&self) -> Result<bool, ArrowError> {
+        decoder::is_array(self.value)
+    }
+
+    /// Converts the variant value to a serde_json::Value
+    pub fn as_value(&self) -> Result<serde_json::Value, ArrowError> {
+        let keys = crate::decoder::parse_metadata_keys(self.metadata)?;
+        crate::decoder::decode_value(self.value, &keys)

Review Comment:
   I think we should try and avoid using the JSON representation when decoding 
Variant values for such a low level API because:
   1. They are likely inefficient (e.g. to access a Variant value it needs to 
be converted from bytes --> Json --> Variant)
   2. They can't represent the types fully (e.g. there is no way to represent 
the different between Int and Float in JSON, all numbers are floats)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to