scovich commented on code in PR #7452:
URL: https://github.com/apache/arrow-rs/pull/7452#discussion_r2095809708


##########
arrow-variant/src/variant.rs:
##########


Review Comment:
   I have not reviewed the code carefully at all yet, and what follows is a 
general observation based on the inherent nature of variant data and rust 
notions of safety:
   
   It will be really tempting to have "efficient" code that e.g. uses 
[from_utf8_unchecked](https://doc.rust-lang.org/std/primitive.str.html#method.from_utf8_unchecked)
 to extract a `&str` from a `&[u8]`, or to use indexing operations like `v[10]` 
to extract bytes. But variant data is generally untrusted user input and 
whatever `Variant` struct/enum we define will become the first -- and often 
only -- line of defense against malicious or malformed input. 
   
   Hopefully we can code carefully, with the goal that sizes and/or contents of 
metadata and value slices will never cause a panic? 
   
   Additinoally, it seems like we have a few choices for values such as strings 
and decimals even a right-sized byte slice can contain invalid values:
   1. Return obviously unvalidated values, e.g. `&[u8]` instead of `&str` for 
strings, and `&[u8]` instead of whatever `VariantDecimal` struct we might 
otherwise define -- leaving the user responsible to finish the conversion as 
(un)safely as they deem prudent.
   2. Return ostensibly validated values, with (safe) checked and (unsafe) 
unchecked constructors and/or getters that let the user choose the one they 
deem appropriate.
   
   I personally favor the latter approach (safe and easy to use, even if not 
always the absolutely max efficient), but the topic probably needs a wider 
discussion.



##########
arrow-variant/src/variant.rs:
##########
@@ -0,0 +1,418 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Core Variant data type for working with the Arrow Variant binary format.
+
+use crate::decoder;
+use arrow_schema::ArrowError;
+use std::fmt;
+
+/// A Variant value in the Arrow binary format
+#[derive(Debug, Clone, PartialEq)]
+pub struct Variant<'a> {
+    /// Raw metadata bytes
+    metadata: &'a [u8],
+    /// Raw value bytes
+    value: &'a [u8],
+}
+
+impl<'a> Variant<'a> {
+    /// Creates a new Variant with metadata and value bytes
+    pub fn new(metadata: &'a [u8], value: &'a [u8]) -> Self {

Review Comment:
   This should be `new_unchecked`?
   
   (but if we made it an enum as suggested above, the method will just go away 
-- internal code can directly create the desired enum variant if it knows all 
invariants hold)



##########
arrow-variant/src/variant.rs:
##########
@@ -0,0 +1,418 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Core Variant data type for working with the Arrow Variant binary format.
+
+use crate::decoder;
+use arrow_schema::ArrowError;
+use std::fmt;
+
+/// A Variant value in the Arrow binary format
+#[derive(Debug, Clone, PartialEq)]
+pub struct Variant<'a> {
+    /// Raw metadata bytes
+    metadata: &'a [u8],
+    /// Raw value bytes
+    value: &'a [u8],
+}
+
+impl<'a> Variant<'a> {
+    /// Creates a new Variant with metadata and value bytes
+    pub fn new(metadata: &'a [u8], value: &'a [u8]) -> Self {
+        Self { metadata, value }
+    }
+
+    /// Creates a Variant by parsing binary metadata and value
+    pub fn try_new(metadata: &'a [u8], value: &'a [u8]) -> Result<Self, 
ArrowError> {
+        // Validate that the binary data is a valid Variant
+        decoder::validate_variant(value, metadata)?;
+
+        Ok(Self { metadata, value })
+    }
+
+    /// Returns the raw metadata bytes
+    pub fn metadata(&self) -> &'a [u8] {
+        self.metadata
+    }
+
+    /// Returns the raw value bytes
+    pub fn value(&self) -> &'a [u8] {
+        self.value
+    }
+
+    /// Gets a value by key from an object Variant
+    ///
+    /// Returns:
+    /// - `Ok(Some(Variant))` if the key exists
+    /// - `Ok(None)` if the key doesn't exist or the Variant is not an object
+    /// - `Err` if there was an error parsing the Variant
+    pub fn get(&self, key: &str) -> Result<Option<Variant<'a>>, ArrowError> {
+        let result = decoder::get_field_value_range(self.value, self.metadata, 
key)?;
+        Ok(result.map(|(start, end)| Variant {
+            metadata: self.metadata,        // Share the same metadata 
reference
+            value: &self.value[start..end], // Use a slice of the original 
value buffer
+        }))
+    }
+
+    /// Gets a value by index from an array Variant
+    ///
+    /// Returns:
+    /// - `Ok(Some(Variant))` if the index is valid
+    /// - `Ok(None)` if the index is out of bounds or the Variant is not an 
array
+    /// - `Err` if there was an error parsing the Variant
+    pub fn get_index(&self, index: usize) -> Result<Option<Variant<'a>>, 
ArrowError> {
+        let result = decoder::get_array_element_range(self.value, index)?;
+        Ok(result.map(|(start, end)| Variant {
+            metadata: self.metadata,        // Share the same metadata 
reference
+            value: &self.value[start..end], // Use a slice of the original 
value buffer
+        }))
+    }
+
+    /// Checks if this Variant is an object
+    pub fn is_object(&self) -> Result<bool, ArrowError> {
+        decoder::is_object(self.value)
+    }
+
+    /// Checks if this Variant is an array
+    pub fn is_array(&self) -> Result<bool, ArrowError> {
+        decoder::is_array(self.value)
+    }
+
+    /// Converts the variant value to a serde_json::Value
+    pub fn as_value(&self) -> Result<serde_json::Value, ArrowError> {
+        let keys = crate::decoder::parse_metadata_keys(self.metadata)?;
+        crate::decoder::decode_value(self.value, &keys)
+    }
+
+    /// Converts the variant value to a string.
+    pub fn as_string(&self) -> Result<String, ArrowError> {
+        match self.as_value()? {
+            serde_json::Value::String(s) => Ok(s),
+            serde_json::Value::Number(n) => Ok(n.to_string()),
+            serde_json::Value::Bool(b) => Ok(b.to_string()),
+            serde_json::Value::Null => Ok("null".to_string()),
+            _ => Err(ArrowError::InvalidArgumentError(
+                "Cannot convert value to string".to_string(),
+            )),
+        }
+    }
+
+    /// Converts the variant value to a i32.
+    pub fn as_i32(&self) -> Result<i32, ArrowError> {
+        match self.as_value()? {
+            serde_json::Value::Number(n) => {
+                if let Some(i) = n.as_i64() {
+                    if i >= i32::MIN as i64 && i <= i32::MAX as i64 {
+                        return Ok(i as i32);
+                    }
+                }
+                Err(ArrowError::InvalidArgumentError(
+                    "Number outside i32 range".to_string(),
+                ))
+            }
+            _ => Err(ArrowError::InvalidArgumentError(
+                "Cannot convert value to i32".to_string(),
+            )),
+        }
+    }
+
+    /// Converts the variant value to a i64.
+    pub fn as_i64(&self) -> Result<i64, ArrowError> {

Review Comment:
   Once we introduce an enum version of `Variant`, we open up the question of 
type conversions. I guess we would want support for automatic type widening? 
e.g. something like: 
   
   <details>
   
   ```rust
   pub fn as_i64(&self) -> Result<Option<i64>, ArrowError> {
       use Variant::*;
       let val = match self {
           Null => return Ok(None),
           Int64(val) => val,
           Int32(val) => val.into(),
           Int16(val) => val.into(),
           Int8(val) => val.into(),
           Decimal4(d) if d.scale() == 0 => d.unscaled_value().into(),
           Decimal8(d) if d.scale() == 0 => d.unscaled_value(),
           _ => return Err(...),
       };
       Ok(Some(val))
   }
   
   pub fn as_f64(&self) -> Result<Option<f64>, ArrowError> {
       use Variant::*;
       let val = match self {
           Null => return Ok(None),
           Int32(val) => val.into(),
           Int16(val) => val.into(),
           Int8(val) => val.into(),
           Decimal4(d) => d.unscaled_value().into(),
           Float(val) => val.into(),
           _ => return Err(...),
       };
       Ok(Some(val))
   }
   
   pub fn as_decimal16(&self, scale: u8) -> Result<Option<VariantDecimal16>, 
ArrowError> {
       use Variant::*;
       let (old_scale, unscaled_value) = match self {
           Null => return Ok(None),
           Decimal16(d) if d.scale() <= scale => (d.scale(), 
d.unscaled_value()),
           Decimal8(d) if d.scale() <= scale => (d.scale(), 
d.unscaled_value().into()),
           Decimal4(d) if d.scale() <= scale => (d.scale(), 
d.unscaled_value().into()),
           Int64(val) => (0, val.into()),
           Int32(val) => (0, val.into()),
           Int16(val) => (0, val.into()),
           Int8(val) => (0, val.into()),
           _ => return Err(...),
       };
       Ok(Some(VariantDecimal16::try_new(scale, old_scale, unscaled_value)?))
   }
   ```
   
   The above assumes something like:
   ```rust
   impl VariantDecimal16 {
       fn try_new(scale: u8, current_scale: u8, unscaled_value: i128) -> 
Result<Self, ArrowError> {
           if scale > 38 || current_scale > 38 || scale < current_scale {
               return Err(...);
           }
           let exponent = u32::from(scale - current_scale);
           let (unscaled_value, false) = 
unscaled_value.overflowing_pow(exponent) else {
               return Err(...);
           };
           Self { scale, unscaled_value }
       }
   }
   ```
   
   </details>



##########
arrow-variant/src/decoder/mod.rs:
##########
@@ -0,0 +1,1563 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Decoder module for converting Variant binary format to JSON values
+use crate::encoder::{VariantBasicType, VariantPrimitiveType};
+use arrow_schema::ArrowError;
+use indexmap::IndexMap;
+#[allow(unused_imports)]
+use serde_json::{json, Map, Value};
+#[allow(unused_imports)]
+use std::collections::HashMap;
+use std::str;
+
+/// Decodes a Variant binary value to a JSON value
+pub fn decode_value(value: &[u8], keys: &[String]) -> Result<Value, 
ArrowError> {
+    println!("Decoding value of length: {}", value.len());
+    let mut pos = 0;
+    let result = decode_value_internal(value, &mut pos, keys)?;
+    println!("Decoded value: {:?}", result);
+    Ok(result)
+}
+
+/// Extracts the basic type from a header byte
+fn get_basic_type(header: u8) -> VariantBasicType {
+    match header & 0x03 {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => unreachable!(),
+    }
+}
+
+/// Extracts the primitive type from a header byte
+fn get_primitive_type(header: u8) -> VariantPrimitiveType {
+    match (header >> 2) & 0x3F {
+        0 => VariantPrimitiveType::Null,
+        1 => VariantPrimitiveType::BooleanTrue,
+        2 => VariantPrimitiveType::BooleanFalse,
+        3 => VariantPrimitiveType::Int8,
+        4 => VariantPrimitiveType::Int16,
+        5 => VariantPrimitiveType::Int32,
+        6 => VariantPrimitiveType::Int64,
+        7 => VariantPrimitiveType::Double,
+        8 => VariantPrimitiveType::Decimal4,
+        9 => VariantPrimitiveType::Decimal8,
+        10 => VariantPrimitiveType::Decimal16,
+        11 => VariantPrimitiveType::Date,
+        12 => VariantPrimitiveType::Timestamp,
+        13 => VariantPrimitiveType::TimestampNTZ,
+        14 => VariantPrimitiveType::Float,
+        15 => VariantPrimitiveType::Binary,
+        16 => VariantPrimitiveType::String,
+        17 => VariantPrimitiveType::TimeNTZ,
+        18 => VariantPrimitiveType::TimestampNanos,
+        19 => VariantPrimitiveType::TimestampNTZNanos,
+        20 => VariantPrimitiveType::Uuid,
+        _ => unreachable!(),
+    }
+}
+
+/// Extracts object header information
+fn get_object_header_info(header: u8) -> (bool, u8, u8) {
+    let header = (header >> 2) & 0x3F; // Get header bits
+    let is_large = (header >> 4) & 0x01 != 0; // is_large from bit 4
+    let id_size = ((header >> 2) & 0x03) + 1; // field_id_size from bits 2-3
+    let offset_size = (header & 0x03) + 1; // offset_size from bits 0-1
+    (is_large, id_size, offset_size)
+}
+
+/// Extracts array header information
+fn get_array_header_info(header: u8) -> (bool, u8) {
+    let header = (header >> 2) & 0x3F; // Get header bits
+    let is_large = (header >> 2) & 0x01 != 0; // is_large from bit 2
+    let offset_size = (header & 0x03) + 1; // offset_size from bits 0-1
+    (is_large, offset_size)
+}
+
+/// Reads an unsigned integer of the specified size
+fn read_unsigned(data: &[u8], pos: &mut usize, size: u8) -> Result<usize, 
ArrowError> {
+    if *pos + (size as usize - 1) >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(format!(
+            "Unexpected end of data for {} byte unsigned integer",
+            size
+        )));
+    }
+
+    let mut value = 0usize;
+    for i in 0..size {
+        value |= (data[*pos + i as usize] as usize) << (8 * i);
+    }
+    *pos += size as usize;
+
+    Ok(value)
+}
+
+/// Internal recursive function to decode a value at the current position
+fn decode_value_internal(
+    data: &[u8],
+    pos: &mut usize,
+    keys: &[String],
+) -> Result<Value, ArrowError> {
+    if *pos >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data".to_string(),
+        ));
+    }
+
+    let header = data[*pos];
+    println!(
+        "Decoding at position {}: header byte = 0x{:02X}",
+        *pos, header
+    );
+    *pos += 1;
+
+    match get_basic_type(header) {
+        VariantBasicType::Primitive => match get_primitive_type(header) {
+            VariantPrimitiveType::Null => Ok(Value::Null),
+            VariantPrimitiveType::BooleanTrue => Ok(Value::Bool(true)),
+            VariantPrimitiveType::BooleanFalse => Ok(Value::Bool(false)),
+            VariantPrimitiveType::Int8 => decode_int8(data, pos),
+            VariantPrimitiveType::Int16 => decode_int16(data, pos),
+            VariantPrimitiveType::Int32 => decode_int32(data, pos),
+            VariantPrimitiveType::Int64 => decode_int64(data, pos),
+            VariantPrimitiveType::Double => decode_double(data, pos),
+            VariantPrimitiveType::Decimal4 => decode_decimal4(data, pos),
+            VariantPrimitiveType::Decimal8 => decode_decimal8(data, pos),
+            VariantPrimitiveType::Decimal16 => decode_decimal16(data, pos),
+            VariantPrimitiveType::Date => decode_date(data, pos),
+            VariantPrimitiveType::Timestamp => decode_timestamp(data, pos),
+            VariantPrimitiveType::TimestampNTZ => decode_timestamp_ntz(data, 
pos),
+            VariantPrimitiveType::Float => decode_float(data, pos),
+            VariantPrimitiveType::Binary => decode_binary(data, pos),
+            VariantPrimitiveType::String => decode_long_string(data, pos),
+            VariantPrimitiveType::TimeNTZ => decode_time_ntz(data, pos),
+            VariantPrimitiveType::TimestampNanos => 
decode_timestamp_nanos(data, pos),
+            VariantPrimitiveType::TimestampNTZNanos => 
decode_timestamp_ntz_nanos(data, pos),
+            VariantPrimitiveType::Uuid => decode_uuid(data, pos),
+        },
+        VariantBasicType::ShortString => {
+            let len = (header >> 2) & 0x3F;
+            println!("Short string with length: {}", len);
+            if *pos + len as usize > data.len() {
+                return Err(ArrowError::InvalidArgumentError(
+                    "Unexpected end of data for short string".to_string(),
+                ));
+            }
+
+            let string_bytes = &data[*pos..*pos + len as usize];
+            *pos += len as usize;
+
+            let string = str::from_utf8(string_bytes)
+                .map_err(|e| ArrowError::SchemaError(format!("Invalid UTF-8 
string: {}", e)))?;
+
+            Ok(Value::String(string.to_string()))
+        }
+        VariantBasicType::Object => {
+            let (is_large, id_size, offset_size) = 
get_object_header_info(header);
+            println!(
+                "Object header: is_large={}, id_size={}, offset_size={}",
+                is_large, id_size, offset_size
+            );
+
+            // Read number of elements
+            let num_elements = if is_large {
+                read_unsigned(data, pos, 4)?
+            } else {
+                read_unsigned(data, pos, 1)?
+            };
+            println!("Object has {} elements", num_elements);
+
+            // Read field IDs
+            let mut field_ids = Vec::with_capacity(num_elements);
+            for _ in 0..num_elements {
+                field_ids.push(read_unsigned(data, pos, id_size)?);
+            }
+            println!("Field IDs: {:?}", field_ids);
+
+            // Read offsets
+            let mut offsets = Vec::with_capacity(num_elements + 1);
+            for _ in 0..=num_elements {
+                offsets.push(read_unsigned(data, pos, offset_size)?);
+            }
+            println!("Offsets: {:?}", offsets);
+
+            // Create object and save position after offsets
+            let mut obj = Map::new();
+            let base_pos = *pos;
+
+            // Process each field
+            for i in 0..num_elements {
+                let field_id = field_ids[i];
+                if field_id >= keys.len() {
+                    return Err(ArrowError::InvalidArgumentError(format!(
+                        "Field ID out of range: {}",
+                        field_id
+                    )));
+                }
+
+                let field_name = &keys[field_id];
+                let start_offset = offsets[i];
+                let end_offset = offsets[i + 1];
+
+                println!(
+                    "Field {}: {} (ID: {}), range: {}..{}",
+                    i,
+                    field_name,
+                    field_id,
+                    base_pos + start_offset,
+                    base_pos + end_offset
+                );
+
+                if base_pos + end_offset > data.len() {
+                    return Err(ArrowError::SchemaError(
+                        "Unexpected end of data for object field".to_string(),
+                    ));
+                }
+
+                // Create a slice just for this field and decode it
+                let field_data = &data[base_pos + start_offset..base_pos + 
end_offset];
+                let mut field_pos = 0;
+                let value = decode_value_internal(field_data, &mut field_pos, 
keys)?;
+
+                obj.insert(field_name.clone(), value);
+            }
+
+            // Update position to end of object data
+            *pos = base_pos + offsets[num_elements];
+            Ok(Value::Object(obj))
+        }
+        VariantBasicType::Array => {
+            let (is_large, offset_size) = get_array_header_info(header);
+            println!(
+                "Array header: is_large={}, offset_size={}",
+                is_large, offset_size
+            );
+
+            // Read number of elements
+            let num_elements = if is_large {
+                read_unsigned(data, pos, 4)?
+            } else {
+                read_unsigned(data, pos, 1)?
+            };
+            println!("Array has {} elements", num_elements);
+
+            // Read offsets
+            let mut offsets = Vec::with_capacity(num_elements + 1);
+            for _ in 0..=num_elements {
+                offsets.push(read_unsigned(data, pos, offset_size)?);
+            }
+            println!("Offsets: {:?}", offsets);
+
+            // Create array and save position after offsets
+            let mut array = Vec::with_capacity(num_elements);
+            let base_pos = *pos;
+
+            // Process each element
+            for i in 0..num_elements {
+                let start_offset = offsets[i];
+                let end_offset = offsets[i + 1];
+
+                println!(
+                    "Element {}: range: {}..{}",
+                    i,
+                    base_pos + start_offset,
+                    base_pos + end_offset
+                );
+
+                if base_pos + end_offset > data.len() {
+                    return Err(ArrowError::SchemaError(
+                        "Unexpected end of data for array element".to_string(),
+                    ));
+                }
+
+                // Create a slice just for this element and decode it
+                let elem_data = &data[base_pos + start_offset..base_pos + 
end_offset];
+                let mut elem_pos = 0;
+                let value = decode_value_internal(elem_data, &mut elem_pos, 
keys)?;
+
+                array.push(value);
+            }
+
+            // Update position to end of array data
+            *pos = base_pos + offsets[num_elements];
+            Ok(Value::Array(array))
+        }
+    }
+}
+
+/// Decodes a null value
+#[allow(dead_code)]
+fn decode_null() -> Result<Value, ArrowError> {
+    Ok(Value::Null)
+}
+
+/// Decodes a primitive value
+#[allow(dead_code)]
+fn decode_primitive(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> 
{
+    if *pos >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for primitive".to_string(),
+        ));
+    }
+
+    // Read the primitive type header
+    let header = data[*pos];
+    *pos += 1;
+
+    // Extract primitive type ID
+    let type_id = header & 0x1F;
+
+    // Decode based on primitive type
+    match type_id {
+        0 => decode_null(),
+        1 => Ok(Value::Bool(true)),
+        2 => Ok(Value::Bool(false)),
+        3 => decode_int8(data, pos),
+        4 => decode_int16(data, pos),
+        5 => decode_int32(data, pos),
+        6 => decode_int64(data, pos),
+        7 => decode_double(data, pos),
+        8 => decode_decimal4(data, pos),
+        9 => decode_decimal8(data, pos),
+        10 => decode_decimal16(data, pos),
+        11 => decode_date(data, pos),
+        12 => decode_timestamp(data, pos),
+        13 => decode_timestamp_ntz(data, pos),
+        14 => decode_float(data, pos),
+        15 => decode_binary(data, pos),
+        16 => decode_long_string(data, pos),
+        17 => decode_time_ntz(data, pos),
+        18 => decode_timestamp_nanos(data, pos),
+        19 => decode_timestamp_ntz_nanos(data, pos),
+        20 => decode_uuid(data, pos),
+        _ => Err(ArrowError::SchemaError(format!(
+            "Unknown primitive type ID: {}",
+            type_id
+        ))),
+    }
+}
+
+/// Decodes a short string value
+#[allow(dead_code)]
+fn decode_short_string(data: &[u8], pos: &mut usize) -> Result<Value, 
ArrowError> {
+    if *pos >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for short string length".to_string(),
+        ));
+    }
+
+    // Read the string length (1 byte)
+    let len = data[*pos] as usize;
+    *pos += 1;
+
+    // Read the string bytes
+    if *pos + len > data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for short string content".to_string(),
+        ));
+    }
+
+    let string_bytes = &data[*pos..*pos + len];
+    *pos += len;
+
+    // Convert to UTF-8 string
+    let string = str::from_utf8(string_bytes)
+        .map_err(|e| ArrowError::SchemaError(format!("Invalid UTF-8 string: 
{}", e)))?;
+
+    Ok(Value::String(string.to_string()))
+}
+
+/// Decodes an int8 value
+fn decode_int8(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for int8".to_string(),
+        ));
+    }
+
+    let value = data[*pos] as i8 as i64;
+    *pos += 1;
+
+    Ok(Value::Number(serde_json::Number::from(value)))
+}
+
+/// Decodes an int16 value
+fn decode_int16(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 1 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for int16".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 2];
+    buf.copy_from_slice(&data[*pos..*pos + 2]);
+    *pos += 2;
+
+    let value = i16::from_le_bytes(buf) as i64;
+    Ok(Value::Number(serde_json::Number::from(value)))
+}
+
+/// Decodes an int32 value
+fn decode_int32(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 3 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for int32".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 4];
+    buf.copy_from_slice(&data[*pos..*pos + 4]);
+    *pos += 4;
+
+    let value = i32::from_le_bytes(buf) as i64;
+    Ok(Value::Number(serde_json::Number::from(value)))
+}
+
+/// Decodes an int64 value
+fn decode_int64(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 7 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for int64".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 8];
+    buf.copy_from_slice(&data[*pos..*pos + 8]);
+    *pos += 8;
+
+    let value = i64::from_le_bytes(buf);
+    Ok(Value::Number(serde_json::Number::from(value)))
+}
+
+/// Decodes a double value
+fn decode_double(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 7 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for double".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 8];
+    buf.copy_from_slice(&data[*pos..*pos + 8]);
+    *pos += 8;
+
+    let value = f64::from_le_bytes(buf);
+
+    // Create a Number from the float
+    let number = serde_json::Number::from_f64(value)
+        .ok_or_else(|| ArrowError::SchemaError(format!("Invalid float value: 
{}", value)))?;
+
+    Ok(Value::Number(number))
+}
+
+/// Decodes a decimal4 value
+fn decode_decimal4(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 4 > data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for decimal4".to_string(),
+        ));
+    }
+
+    // Read scale (1 byte)
+    let scale = data[*pos];
+    *pos += 1;
+
+    // Read unscaled value (4 bytes)
+    let mut buf = [0u8; 4];
+    buf.copy_from_slice(&data[*pos..*pos + 4]);
+    *pos += 4;
+
+    let unscaled = i32::from_le_bytes(buf);
+
+    // Correctly scale the value: divide by 10^scale
+    let scaled = (unscaled as f64) / 10f64.powi(scale as i32);
+
+    // Format as JSON number
+    let number = serde_json::Number::from_f64(scaled)
+        .ok_or_else(|| ArrowError::SchemaError(format!("Invalid decimal value: 
{}", scaled)))?;
+
+    Ok(Value::Number(number))
+}
+
+/// Decodes a decimal8 value
+fn decode_decimal8(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 8 > data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for decimal8".to_string(),
+        ));
+    }
+
+    let scale = data[*pos] as i32;
+    *pos += 1;
+
+    let mut buf = [0u8; 8];
+    buf[..7].copy_from_slice(&data[*pos..*pos + 7]);
+    buf[7] = if (buf[6] & 0x80) != 0 { 0xFF } else { 0x00 };
+    *pos += 7;
+
+    let unscaled = i64::from_le_bytes(buf);
+    let value = (unscaled as f64) / 10f64.powi(scale);
+
+    Ok(Value::Number(
+        serde_json::Number::from_f64(value)
+            .ok_or_else(|| ArrowError::ParseError("Invalid f64 from 
decimal8".to_string()))?,
+    ))
+}
+
+/// Decodes a decimal16 value
+fn decode_decimal16(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> 
{
+    if *pos + 16 > data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for decimal16".to_string(),
+        ));
+    }
+
+    let scale = data[*pos] as i32;
+    *pos += 1;
+
+    let mut buf = [0u8; 16];
+    buf[..15].copy_from_slice(&data[*pos..*pos + 15]);
+    buf[15] = if (buf[14] & 0x80) != 0 { 0xFF } else { 0x00 };
+    *pos += 15;
+
+    let unscaled = i128::from_le_bytes(buf);
+    let s = format!(
+        "{}.{:0>width$}",
+        unscaled / 10i128.pow(scale as u32),
+        (unscaled.abs() % 10i128.pow(scale as u32)),
+        width = scale as usize
+    );
+
+    Ok(Value::String(s))
+}
+
+/// Decodes a date value
+fn decode_date(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 3 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for date".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 4];
+    buf.copy_from_slice(&data[*pos..*pos + 4]);
+    *pos += 4;
+
+    let days = i32::from_le_bytes(buf);
+
+    // Convert to ISO date string (simplified)
+    let date = format!("date-{}", days);
+
+    Ok(Value::String(date))
+}
+
+/// Decodes a timestamp value
+fn decode_timestamp(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> 
{
+    if *pos + 7 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for timestamp".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 8];
+    buf.copy_from_slice(&data[*pos..*pos + 8]);
+    *pos += 8;
+
+    let micros = i64::from_le_bytes(buf);
+
+    // Convert to ISO timestamp string (simplified)
+    let timestamp = format!("timestamp-{}", micros);
+
+    Ok(Value::String(timestamp))
+}
+
+/// Decodes a timestamp without timezone value
+fn decode_timestamp_ntz(data: &[u8], pos: &mut usize) -> Result<Value, 
ArrowError> {
+    if *pos + 7 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for timestamp_ntz".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 8];
+    buf.copy_from_slice(&data[*pos..*pos + 8]);
+    *pos += 8;
+
+    let micros = i64::from_le_bytes(buf);
+
+    // Convert to ISO timestamp string (simplified)
+    let timestamp = format!("timestamp_ntz-{}", micros);
+
+    Ok(Value::String(timestamp))
+}
+
+/// Decodes a float value
+fn decode_float(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 3 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for float".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 4];
+    buf.copy_from_slice(&data[*pos..*pos + 4]);
+    *pos += 4;
+
+    let value = f32::from_le_bytes(buf);
+
+    // Create a Number from the float
+    let number = serde_json::Number::from_f64(value as f64)
+        .ok_or_else(|| ArrowError::SchemaError(format!("Invalid float value: 
{}", value)))?;
+
+    Ok(Value::Number(number))
+}
+
+/// Decodes a binary value
+fn decode_binary(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 3 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for binary length".to_string(),
+        ));
+    }
+
+    // Read the binary length (4 bytes)
+    let mut buf = [0u8; 4];
+    buf.copy_from_slice(&data[*pos..*pos + 4]);
+    *pos += 4;
+
+    let len = u32::from_le_bytes(buf) as usize;
+
+    // Read the binary bytes
+    if *pos + len > data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for binary content".to_string(),
+        ));
+    }
+
+    let binary_bytes = &data[*pos..*pos + len];
+    *pos += len;
+
+    // Convert to hex string instead of base64
+    let hex = binary_bytes
+        .iter()
+        .map(|b| format!("{:02x}", b))
+        .collect::<Vec<String>>()
+        .join("");
+
+    Ok(Value::String(format!("binary:{}", hex)))
+}
+
+/// Decodes a string value
+fn decode_long_string(data: &[u8], pos: &mut usize) -> Result<Value, 
ArrowError> {
+    if *pos + 3 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for string length".to_string(),
+        ));
+    }
+
+    // Read the string length (4 bytes)
+    let mut buf = [0u8; 4];
+    buf.copy_from_slice(&data[*pos..*pos + 4]);
+    *pos += 4;
+
+    let len = u32::from_le_bytes(buf) as usize;
+
+    // Read the string bytes
+    if *pos + len > data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for string content".to_string(),
+        ));
+    }
+
+    let string_bytes = &data[*pos..*pos + len];
+    *pos += len;
+
+    // Convert to UTF-8 string
+    let string = str::from_utf8(string_bytes)
+        .map_err(|e| ArrowError::SchemaError(format!("Invalid UTF-8 string: 
{}", e)))?;
+
+    Ok(Value::String(string.to_string()))
+}
+
+/// Decodes a time without timezone value
+fn decode_time_ntz(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 7 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for time_ntz".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 8];
+    buf.copy_from_slice(&data[*pos..*pos + 8]);
+    *pos += 8;
+
+    let micros = i64::from_le_bytes(buf);
+
+    // Convert to ISO time string (simplified)
+    let time = format!("time_ntz-{}", micros);
+
+    Ok(Value::String(time))
+}
+
+/// Decodes a timestamp with timezone (nanos) value
+fn decode_timestamp_nanos(data: &[u8], pos: &mut usize) -> Result<Value, 
ArrowError> {
+    if *pos + 7 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for timestamp_nanos".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 8];
+    buf.copy_from_slice(&data[*pos..*pos + 8]);
+    *pos += 8;
+
+    let nanos = i64::from_le_bytes(buf);
+
+    // Convert to ISO timestamp string (simplified)
+    let timestamp = format!("timestamp_nanos-{}", nanos);
+
+    Ok(Value::String(timestamp))
+}
+
+/// Decodes a timestamp without timezone (nanos) value
+fn decode_timestamp_ntz_nanos(data: &[u8], pos: &mut usize) -> Result<Value, 
ArrowError> {
+    if *pos + 7 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for timestamp_ntz_nanos".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 8];
+    buf.copy_from_slice(&data[*pos..*pos + 8]);
+    *pos += 8;
+
+    let nanos = i64::from_le_bytes(buf);
+
+    // Convert to ISO timestamp string (simplified)
+    let timestamp = format!("timestamp_ntz_nanos-{}", nanos);
+
+    Ok(Value::String(timestamp))
+}
+
+/// Decodes a UUID value
+fn decode_uuid(data: &[u8], pos: &mut usize) -> Result<Value, ArrowError> {
+    if *pos + 15 >= data.len() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Unexpected end of data for uuid".to_string(),
+        ));
+    }
+
+    let mut buf = [0u8; 16];
+    buf.copy_from_slice(&data[*pos..*pos + 16]);
+    *pos += 16;
+
+    // Convert to UUID string (simplified)
+    let uuid = format!("uuid-{:?}", buf);
+
+    Ok(Value::String(uuid))
+}
+
+/// Decodes a Variant binary to a JSON value using the given metadata
+pub fn decode_json(binary: &[u8], metadata: &[u8]) -> Result<Value, 
ArrowError> {
+    let keys = parse_metadata_keys(metadata)?;
+    decode_value(binary, &keys)
+}
+
+/// A helper struct to simplify metadata dictionary handling
+struct MetadataDictionary {
+    keys: Vec<String>,
+    key_to_id: IndexMap<String, usize>,
+}
+
+impl MetadataDictionary {
+    fn new(metadata: &[u8]) -> Result<Self, ArrowError> {
+        let keys = parse_metadata_keys(metadata)?;
+
+        // Build key to id mapping for faster lookups
+        let mut key_to_id = IndexMap::new();
+        for (i, key) in keys.iter().enumerate() {
+            key_to_id.insert(key.clone(), i);
+        }
+
+        Ok(Self { keys, key_to_id })
+    }
+
+    fn get_field_id(&self, key: &str) -> Option<usize> {
+        self.key_to_id.get(key).copied()
+    }
+
+    fn get_key(&self, id: usize) -> Option<&str> {
+        self.keys.get(id).map(|s| s.as_str())
+    }
+}
+
+/// Parses metadata to extract the key list
+pub fn parse_metadata_keys(metadata: &[u8]) -> Result<Vec<String>, ArrowError> 
{
+    if metadata.is_empty() {
+        // Return empty key list if no metadata
+        return Ok(Vec::new());
+    }
+
+    // Parse header
+    let header = metadata[0];
+    let version = header & 0x0F;
+    let _sorted = (header >> 4) & 0x01 != 0;
+    let offset_size_minus_one = (header >> 6) & 0x03;
+    let offset_size = (offset_size_minus_one + 1) as usize;
+
+    if version != 1 {
+        return Err(ArrowError::SchemaError(format!(
+            "Unsupported version: {}",
+            version
+        )));
+    }
+
+    if metadata.len() < 1 + offset_size {
+        return Err(ArrowError::SchemaError(
+            "Metadata too short for dictionary size".to_string(),
+        ));
+    }
+
+    // Parse dictionary_size
+    let mut dictionary_size = 0u32;
+    for i in 0..offset_size {
+        dictionary_size |= (metadata[1 + i] as u32) << (8 * i);
+    }
+
+    // Early return if dictionary is empty
+    if dictionary_size == 0 {
+        return Ok(Vec::new());
+    }
+
+    // Parse offsets
+    let offset_start = 1 + offset_size;
+    let offset_end = offset_start + (dictionary_size as usize + 1) * 
offset_size;
+
+    if metadata.len() < offset_end {
+        return Err(ArrowError::SchemaError(
+            "Metadata too short for offsets".to_string(),
+        ));
+    }
+
+    let mut offsets = Vec::with_capacity(dictionary_size as usize + 1);
+    for i in 0..=dictionary_size {
+        let offset_pos = offset_start + (i as usize * offset_size);
+        let mut offset = 0u32;
+        for j in 0..offset_size {
+            offset |= (metadata[offset_pos + j] as u32) << (8 * j);
+        }
+        offsets.push(offset as usize);
+    }
+
+    // Parse dictionary strings
+    let mut keys = Vec::with_capacity(dictionary_size as usize);
+
+    for i in 0..dictionary_size as usize {
+        let start = offset_end + offsets[i];
+        let end = offset_end + offsets[i + 1];
+
+        if end > metadata.len() {
+            return Err(ArrowError::SchemaError(format!(
+                "Invalid string offset: start={}, end={}, metadata_len={}",
+                start,
+                end,
+                metadata.len()
+            )));
+        }
+
+        let key = str::from_utf8(&metadata[start..end])
+            .map_err(|e| ArrowError::SchemaError(format!("Invalid UTF-8: {}", 
e)))?
+            .to_string();
+
+        keys.push(key);
+    }
+
+    println!("Parsed metadata keys: {:?}", keys);
+
+    Ok(keys)
+}
+
+/// Validates that the binary data represents a valid Variant
+/// Returns error if the format is invalid
+pub fn validate_variant(value: &[u8], metadata: &[u8]) -> Result<(), 
ArrowError> {
+    // Check if metadata is valid
+    let keys = parse_metadata_keys(metadata)?;
+
+    // Try to decode the value using the metadata to validate the format
+    let mut pos = 0;
+    decode_value_internal(value, &mut pos, &keys)?;
+
+    Ok(())
+}
+
+/// Checks if the variant is an object
+pub fn is_object(value: &[u8]) -> Result<bool, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Empty value data".to_string(),
+        ));
+    }
+
+    let header = value[0];
+    let basic_type = get_basic_type(header);
+
+    Ok(matches!(basic_type, VariantBasicType::Object))
+}
+
+/// Checks if the variant is an array
+pub fn is_array(value: &[u8]) -> Result<bool, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Empty value data".to_string(),
+        ));
+    }
+
+    let header = value[0];
+    let basic_type = get_basic_type(header);
+
+    Ok(matches!(basic_type, VariantBasicType::Array))
+}
+
+/// Formats a variant value as a string for debugging purposes
+pub fn format_variant_value(value: &[u8], metadata: &[u8]) -> Result<String, 
ArrowError> {
+    if value.is_empty() {
+        return Ok("null".to_string());
+    }
+
+    let keys = parse_metadata_keys(metadata)?;
+    let mut pos = 0;
+    let json_value = decode_value_internal(value, &mut pos, &keys)?;
+
+    // Return the JSON string representation
+    Ok(json_value.to_string())
+}
+
+/// Gets a field value range from an object variant
+pub fn get_field_value_range(
+    value: &[u8],
+    metadata: &[u8],
+    key: &str,
+) -> Result<Option<(usize, usize)>, ArrowError> {

Review Comment:
   Why not return an actual `Range<usize>`?



##########
arrow-variant/src/variant.rs:
##########
@@ -0,0 +1,418 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Core Variant data type for working with the Arrow Variant binary format.
+
+use crate::decoder;
+use arrow_schema::ArrowError;
+use std::fmt;
+
+/// A Variant value in the Arrow binary format
+#[derive(Debug, Clone, PartialEq)]
+pub struct Variant<'a> {

Review Comment:
   A few notes:
   * Can we use `'m` and `'v` as self-documenting lifetimes?
   * `String(&'m str)` and `ShortString(&'v str)` have different lifetimes
   * The enum variants for most types need args. It's probably nicer to track 
decoded values (i32, f64, etc) rather than slices of small-endian bytes?
   * Decimal will need some kind of design?
   * UUID would be handled by `Uuid([u8; 16])`, because a slice would also take 
16 bytes)?
   
   <details>
   <summary>Possible VariantDecimal type?</summary>
   
   ```rust
   // NOTE: This should be a sealed trait
   trait UnscaledDecimalValue: Copy {
       const MAX_SCALE: u8;
   }
   impl UnscaledDecimalValue for i32 {
       const MAX_SCALE: u8 = 9; // 31*log10(2)
   }
   impl UnscaledDecimalValue for i64 {
       const MAX_SCALE: u8 = 18; // 63*log10(2)
   }
   impl UnscaledDecimalValue for i128 {
       const MAX_SCALE: u8 = 38; // 127*log10(2)
   }
   pub struct VariantDecimal<U: UnscaledDecimalValue> {
       scale: u8,
       unscaled_value: U,
   }
   impl<U: UnscaledDecimalValue> VariantDecimal<U> {
       pub fn try_new(scale: u8, unscaled_value: U) -> Result<Self, ArrowError> 
{
           if scale <= U::MAX_SCALE {
               Ok(Self { scale, unscaled_value })
           } else {
               Err(...)
           }
       }
       pub fn scale() -> u8 {
           self.scale
       }
       pub fn unscaled_value() -> U {
           self.unscaled_value
       }
   }
   
   pub enum Variant<'m, 'v> {
         ...
       Decimal4(VariantDecimal<i32>),
       Decimal8(VariantDecimal<i64>),
       Decimal16(VariantDecimal<i128>),
         ...
   }
   ```
   
   </details>
   



##########
arrow-variant/src/variant.rs:
##########
@@ -0,0 +1,418 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Core Variant data type for working with the Arrow Variant binary format.
+
+use crate::decoder;
+use arrow_schema::ArrowError;
+use std::fmt;
+
+/// A Variant value in the Arrow binary format
+#[derive(Debug, Clone, PartialEq)]
+pub struct Variant<'a> {
+    /// Raw metadata bytes
+    metadata: &'a [u8],
+    /// Raw value bytes
+    value: &'a [u8],
+}
+
+impl<'a> Variant<'a> {
+    /// Creates a new Variant with metadata and value bytes
+    pub fn new(metadata: &'a [u8], value: &'a [u8]) -> Self {
+        Self { metadata, value }
+    }
+
+    /// Creates a Variant by parsing binary metadata and value
+    pub fn try_new(metadata: &'a [u8], value: &'a [u8]) -> Result<Self, 
ArrowError> {
+        // Validate that the binary data is a valid Variant
+        decoder::validate_variant(value, metadata)?;

Review Comment:
   If we make this an enum, then the constructor itself will naturally do most 
of the validation?
   
   <details>
   
   ```rust
   pub fn try_new(metadata: &'m [u8], value: &'v [u8]) -> Result<Self, 
ArrowError> {
       use Variant::*;
       let Some(header) = v.get(0) else {
           return Err(...);
       };
       let basic_type = header & 0b11;
       let value_header = header >> 2;
       let result = match basic_type {
           0 => match value_header {
               0 => Null,
               1 => True,
               2 => False,
                 ...
               6 => Int64(i64::try_from_le_bytes(v[1..])?),
               7 => Double(f64::try_from_le_bytes(v[1..])?),
               8 => Decimal4(VariantDecimal4::try_new(metadata, value[1..])?),
                 ...
               20 => Uuid(v[1..].try_into_array()?),
               _ => return Err(...),
           },
           1 => {
               let len = usize::from(value_header);
               let value = value[1..];
               if value.len() != len {
                   return Err(...);
               }
               ShortString(str::from_utf8(value)?),
           }
           2 => Object(VariantObject::try_new(metadata, value[1..])?),
           3 => Array(VariantArray::try_new(metadata, value[1..])?),
           _ => return Err(...),
       };
       Ok(result)
   }
   ```
   with helpers:
   <details>
   
   ```rust
   // Helper that converts TryFromSliceError into ArrowError
   fn try_into_array<const N: usize>(bytes: &[u8]) -> Result<[u8; N], 
ArrowError> {
       bytes.try_into().map_err(|_| ...)
   }
   
   // Expose the existing family of primitive `from_le_bytes` methods as a trait
   trait TryFromLittleEndianBytes<const N: usize>: Sized {
       fn try_from_le_bytes(bytes: &[u8]) -> Result<Self, ArrowError> {
           Ok(Self::from_le_bytes(try_into_array(bytes)?))
       }
   
       fn from_le_bytes(bytes: [u8; N]) -> Self;
   }
   
   macro_rules! TryFromLittleEndianBytes {
       ($ty:ty) => {
           const _: () = {
               const N: usize = std::mem::size_of::<$ty>();
               impl TryFromLittleEndianBytes<N> for $ty {
                   fn from_le_bytes(bytes: [u8; N]) -> $ty {
                       <$ty>::from_le_bytes(bytes)
                   }
               }
           };
       };
   }
   
   TryFromLittleEndianBytes!(i64);
   TryFromLittleEndianBytes!(f64);
   ```
   </details>
   
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to