scovich commented on code in PR #7644:
URL: https://github.com/apache/arrow-rs/pull/7644#discussion_r2140522645


##########
parquet-variant/src/decoder.rs:
##########
@@ -33,7 +34,18 @@ pub enum VariantPrimitiveType {
     BooleanTrue = 1,
     BooleanFalse = 2,
     Int8 = 3,
+    Int16 = 4,
+    Int32 = 5,
+    Int64 = 6,
+    Decimal4 = 8,

Review Comment:
   Double?



##########
parquet-variant/src/decoder.rs:
##########
@@ -33,7 +34,18 @@ pub enum VariantPrimitiveType {
     BooleanTrue = 1,
     BooleanFalse = 2,
     Int8 = 3,
+    Int16 = 4,
+    Int32 = 5,
+    Int64 = 6,
+    Decimal4 = 8,
+    Decimal8 = 9,
+    Decimal16 = 10,
     // TODO: Add types for the rest of primitives, once API is agreed upon
+    Date = 11,
+    TimestampMicros = 12,
+    TimestampNTZMicros = 13,

Review Comment:
   Acronyms and camel case are always awkward, but I _think_ we normally do 
init-caps instead of all-caps in rust?
   ```suggestion
       TimestampNtzMicros = 13,
   ```



##########
parquet-variant/src/decoder.rs:
##########
@@ -90,6 +113,89 @@ pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, 
ArrowError> {
     Ok(value)
 }
 
+/// Decodes an Int16 from the value section of a variant.
+pub(crate) fn decode_int16(value: &[u8]) -> Result<i16, ArrowError> {
+    let value = i16::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)

Review Comment:
   Single-use variable considered annoying if it doesn't help line breaks etc?
   ```suggestion
       Ok(i16::from_le_bytes(array_from_slice(value, 1)?))
   ```
   (several more below, and one already-existing above)



##########
parquet-variant/src/decoder.rs:
##########
@@ -90,6 +113,89 @@ pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, 
ArrowError> {
     Ok(value)
 }
 
+/// Decodes an Int16 from the value section of a variant.
+pub(crate) fn decode_int16(value: &[u8]) -> Result<i16, ArrowError> {
+    let value = i16::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int32 from the value section of a variant.
+pub(crate) fn decode_int32(value: &[u8]) -> Result<i32, ArrowError> {
+    let value = i32::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int64 from the value section of a variant.
+pub(crate) fn decode_int64(value: &[u8]) -> Result<i64, ArrowError> {
+    let value = i64::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes a Decimal4 from the value section of a variant.
+pub(crate) fn decode_decimal4(value: &[u8]) -> Result<(i32, u8), ArrowError> {
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i32::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Decimal8 from the value section of a variant.
+pub(crate) fn decode_decimal8(value: &[u8]) -> Result<(i64, u8), ArrowError> {
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i64::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Decimal16 from the value section of a variant.
+pub(crate) fn decode_decimal16(value: &[u8]) -> Result<(i128, u8), ArrowError> 
{
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i128::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Float from the value section of a variant.
+pub(crate) fn decode_float(value: &[u8]) -> Result<f32, ArrowError> {
+    let value = f32::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes a Date from the value section of a variant.
+pub(crate) fn decode_date(value: &[u8]) -> Result<NaiveDate, ArrowError> {
+    let days_since_epoch = i32::from_le_bytes(array_from_slice(value, 1)?);
+    let value = (DateTime::UNIX_EPOCH + Duration::days(days_since_epoch as 
i64)).date_naive();
+    Ok(value)

Review Comment:
   I'm not sure what the official rules are, but I prefer to avoid `as` casting 
of integral types as error-prone.
   ```suggestion
       let value = DateTime::UNIX_EPOCH + 
Duration::days(i64::from(days_since_epoch)));
       Ok(value.date_naive())
   ```
   (tl;dr: Somebody reading the code has to stop and think about whether the 
`as` is safe/correct, vs. `from` that is obviously safe)



##########
parquet-variant/src/decoder.rs:
##########
@@ -33,7 +34,18 @@ pub enum VariantPrimitiveType {
     BooleanTrue = 1,
     BooleanFalse = 2,
     Int8 = 3,
+    Int16 = 4,
+    Int32 = 5,
+    Int64 = 6,
+    Decimal4 = 8,
+    Decimal8 = 9,
+    Decimal16 = 10,
     // TODO: Add types for the rest of primitives, once API is agreed upon
+    Date = 11,
+    TimestampMicros = 12,
+    TimestampNTZMicros = 13,
+    Float = 14,
+    Binary = 15,
     String = 16,
 }

Review Comment:
   TimeNtz, TimestampNanos, TimestampNtzNanos? Uuid?
   
   (no strong opinion about whether this PR needs to support them, vs. future 
PR... but PR title and content should probably match?)



##########
parquet-variant/src/decoder.rs:
##########
@@ -90,6 +113,89 @@ pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, 
ArrowError> {
     Ok(value)
 }
 
+/// Decodes an Int16 from the value section of a variant.
+pub(crate) fn decode_int16(value: &[u8]) -> Result<i16, ArrowError> {
+    let value = i16::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int32 from the value section of a variant.
+pub(crate) fn decode_int32(value: &[u8]) -> Result<i32, ArrowError> {
+    let value = i32::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int64 from the value section of a variant.
+pub(crate) fn decode_int64(value: &[u8]) -> Result<i64, ArrowError> {
+    let value = i64::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes a Decimal4 from the value section of a variant.
+pub(crate) fn decode_decimal4(value: &[u8]) -> Result<(i32, u8), ArrowError> {
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i32::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Decimal8 from the value section of a variant.
+pub(crate) fn decode_decimal8(value: &[u8]) -> Result<(i64, u8), ArrowError> {
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i64::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Decimal16 from the value section of a variant.
+pub(crate) fn decode_decimal16(value: &[u8]) -> Result<(i128, u8), ArrowError> 
{
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i128::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Float from the value section of a variant.
+pub(crate) fn decode_float(value: &[u8]) -> Result<f32, ArrowError> {
+    let value = f32::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes a Date from the value section of a variant.
+pub(crate) fn decode_date(value: &[u8]) -> Result<NaiveDate, ArrowError> {
+    let days_since_epoch = i32::from_le_bytes(array_from_slice(value, 1)?);
+    let value = (DateTime::UNIX_EPOCH + Duration::days(days_since_epoch as 
i64)).date_naive();
+    Ok(value)
+}
+
+/// Decodes a TimestampMicros from the value section of a variant.
+pub(crate) fn decode_timestamp_micros(value: &[u8]) -> Result<DateTime<Utc>, 
ArrowError> {
+    let micros_since_epoch = i64::from_le_bytes(array_from_slice(value, 1)?);
+    if let Some(value) = DateTime::from_timestamp_micros(micros_since_epoch) {
+        Ok(value)
+    } else {
+        Err(ArrowError::CastError(format!(
+            "Could not cast `{micros_since_epoch}` microseconds into a 
DateTime<Utc>"
+        )))
+    }

Review Comment:
   `ok_or_else`?
   ```suggestion
       DateTime::from_timestamp_micros(micros_since_epoch).ok_or_else(|| {
           ArrowError::CastError(format!(
               "Could not convert `{micros_since_epoch}` to a microsecond 
timestamp"
           ))
       })
   ```
   (re: error message change, see below)



##########
parquet-variant/src/decoder.rs:
##########
@@ -112,14 +218,222 @@ mod tests {
     #[test]
     fn test_i8() -> Result<(), ArrowError> {
         let value = [
-            3 << 2, // Primitive type for i8
-            42,
+            (VariantPrimitiveType::Int8 as u8) << 2, // Basic type
+            0x2a,                                    // Data
         ];
         let result = decode_int8(&value)?;
         assert_eq!(result, 42);

Review Comment:
   Should we also change this to 0x2a?



##########
parquet-variant/src/decoder.rs:
##########
@@ -112,14 +218,222 @@ mod tests {
     #[test]
     fn test_i8() -> Result<(), ArrowError> {
         let value = [
-            3 << 2, // Primitive type for i8
-            42,
+            (VariantPrimitiveType::Int8 as u8) << 2, // Basic type
+            0x2a,                                    // Data
         ];
         let result = decode_int8(&value)?;
         assert_eq!(result, 42);
         Ok(())
     }
 
+    #[test]
+    fn test_i16() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int16 as u8) << 2, // Basic type
+            0xd2,
+            0x04, // Data
+        ];
+        let result = decode_int16(&value)?;
+        assert_eq!(result, 1234);

Review Comment:
   0x04d2?
   
   Writing the number as hex makes clear that the test is about getting the 
bytes where they belong. Otherwise, most humans will have to go to their 
calculator to check the hex for 1234.



##########
parquet-variant/src/decoder.rs:
##########
@@ -112,14 +218,222 @@ mod tests {
     #[test]
     fn test_i8() -> Result<(), ArrowError> {
         let value = [
-            3 << 2, // Primitive type for i8
-            42,
+            (VariantPrimitiveType::Int8 as u8) << 2, // Basic type
+            0x2a,                                    // Data
         ];
         let result = decode_int8(&value)?;
         assert_eq!(result, 42);
         Ok(())
     }
 
+    #[test]
+    fn test_i16() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int16 as u8) << 2, // Basic type
+            0xd2,
+            0x04, // Data
+        ];
+        let result = decode_int16(&value)?;
+        assert_eq!(result, 1234);
+        Ok(())
+    }
+
+    #[test]
+    fn test_i32() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int32 as u8) << 2, // Basic type
+            0x40,
+            0xe2,
+            0x01,
+            0x00, // Data
+        ];
+        let result = decode_int32(&value)?;
+        assert_eq!(result, 123456);
+        Ok(())

Review Comment:
   Something to consider: Unit tests that use `?` are hard to debug (don't get 
a line number for the actual failure, which can be "fun" if there's more than 
one `?` in the test. 
   ```suggestion
           assert_eq!(decode_int32(&value), Ok(123456));
   ```
   (tho in the general case, a test littered with `.unwrap()` calls can be 
annoying in a different way, so there's no really good answer unless/until rust 
comes up with a way for `?` to panic in tests)



##########
parquet-variant/src/variant.rs:
##########
@@ -402,11 +403,21 @@ pub enum Variant<'m, 'v> {
     // TODO: Add types for the rest of the primitive types, once API is agreed 
upon
     Null,
     Int8(i8),
-
+    Int16(i16),
+    Int32(i32),
+    Int64(i64),
+    Date(NaiveDate),
+    TimestampMicros(DateTime<Utc>),
+    TimestampNTZMicros(NaiveDateTime),
+    Decimal4 { integer: i32, scale: u8 },
+    Decimal8 { integer: i64, scale: u8 },
+    Decimal16 { integer: i128, scale: u8 },

Review Comment:
   It's hard to pass around the content of tuple/struct enum variants with 
multiple members -- you have to unpack them and pass scalar values around, 
which is bulky and error-prone (**). Should we consider creating a struct or 
family of structs for representing scalar decimal values?
   
   <details>
   <summary>(**)</summary>
   
    It would be lovely if `Variant::Decimal4` were an actual type in rust. It 
actually _is_ a type under the hood, but the language doesn't expose it as 
such, because the enum discriminant has to be stored somewhere and rust usually 
embeds it in the variant itself to save space. See 
https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html#guide-level-explanation
 for gory details.
   </details>



##########
parquet-variant/src/variant.rs:
##########
@@ -483,6 +632,57 @@ impl<'m, 'v> From<i8> for Variant<'m, 'v> {
     }
 }
 
+impl<'m, 'v> From<i16> for Variant<'m, 'v> {
+    fn from(value: i16) -> Self {
+        Variant::Int16(value)
+    }
+}
+
+impl<'m, 'v> From<i32> for Variant<'m, 'v> {

Review Comment:
   I'm not sure how useful these `From` would be in practice, especially once a 
variant builder/encoder API lands?
   
   What use case(s) would they serve?
   Are they designed to make unit tests easier to write?
   Do we anticipate the variant builder/encoder would use them internally?
   
   Also:
      * `Variant::String` and `Variant::ShortString` are problematic because 
there's no way to e.g. embed one in a `Variant::Object` later on.
      * We anyway couldn't create `Variant::Object` or `Variant::Array` (they 
would need a mutable `VariantMetadataBuilder` to store field names in, etc.)
   



##########
parquet-variant/src/decoder.rs:
##########
@@ -112,14 +218,222 @@ mod tests {
     #[test]
     fn test_i8() -> Result<(), ArrowError> {
         let value = [
-            3 << 2, // Primitive type for i8
-            42,
+            (VariantPrimitiveType::Int8 as u8) << 2, // Basic type
+            0x2a,                                    // Data
         ];
         let result = decode_int8(&value)?;
         assert_eq!(result, 42);
         Ok(())
     }
 
+    #[test]
+    fn test_i16() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int16 as u8) << 2, // Basic type
+            0xd2,
+            0x04, // Data
+        ];
+        let result = decode_int16(&value)?;
+        assert_eq!(result, 1234);
+        Ok(())
+    }
+
+    #[test]
+    fn test_i32() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int32 as u8) << 2, // Basic type
+            0x40,
+            0xe2,
+            0x01,
+            0x00, // Data
+        ];
+        let result = decode_int32(&value)?;
+        assert_eq!(result, 123456);
+        Ok(())
+    }
+
+    #[test]
+    fn test_i64() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int64 as u8) << 2, // Basic type
+            0x15,
+            0x81,
+            0xe9,
+            0x7d,
+            0xf4,
+            0x10,
+            0x22,
+            0x11, // Data
+        ];
+        let result = decode_int64(&value)?;
+        assert_eq!(result, 1234567890123456789);
+        Ok(())
+    }
+
+    #[test]
+    fn test_decimal4() -> Result<(), ArrowError> {

Review Comment:
   That said... it would probably be very hard to test every possible way 
things could go wrong, so some kind of fuzz testing would probably be necessary 
to get beyond basic bad-input testing?



##########
parquet-variant/tests/variant_interop.rs:
##########
@@ -46,22 +47,22 @@ fn get_primitive_cases() -> Vec<(&'static str, 
Variant<'static, 'static>)> {
     // Cases are commented out
     // Enabling is tracked in  https://github.com/apache/arrow-rs/issues/7630
     vec![
-        // ("primitive_binary", Variant::Binary),
+        ("primitive_binary", Variant::Binary(&[0x03, 0x13, 0x37, 0xde, 0xad, 
0xbe, 0xef, 0xca, 0xfe])),
         ("primitive_boolean_false", Variant::BooleanFalse),
         ("primitive_boolean_true", Variant::BooleanTrue),
-        // ("primitive_date", Variant::Null),
-        //("primitive_decimal4", Variant::Null),
-        //("primitive_decimal8", Variant::Null),
-        //("primitive_decimal16", Variant::Null),
-        //("primitive_float", Variant::Null),
+        ("primitive_date", Variant::Date(NaiveDate::from_ymd_opt(2025, 4 , 
16).unwrap())),
+        ("primitive_decimal4", Variant::Decimal4{integer: 1234, scale: 2}),
+        ("primitive_decimal8", Variant::Decimal8{integer: 1234567890, scale: 
2}),
+        ("primitive_decimal16", Variant::Decimal16{integer: 
1234567891234567890, scale: 2}),
+        ("primitive_float", Variant::Float(1234567890.1234)),
         ("primitive_int8", Variant::Int8(42)),
-        //("primitive_int16", Variant::Null),
-        //("primitive_int32", Variant::Null),
-        //("primitive_int64", Variant::Null),
+        ("primitive_int16", Variant::Int16(1234)),
+        ("primitive_int32", Variant::Int32(123456)),
+        ("primitive_int64", Variant::Int64(1234567890123456789)),
         ("primitive_null", Variant::Null),
         ("primitive_string", Variant::String("This string is longer than 64 
bytes and therefore does not fit in a short_string and it also includes several 
non ascii characters such as 🐢, 💖, ♥\u{fe0f}, 🎣 and 🤦!!")),
-        //("primitive_timestamp", Variant::Null),
-        //("primitive_timestampntz", Variant::Null),
+        ("primitive_timestamp", 
Variant::TimestampMicros(NaiveDate::from_ymd_opt(2025, 4, 
16).unwrap().and_hms_milli_opt(16, 34, 56, 780).unwrap().and_utc())),

Review Comment:
   Those seem to be the same two values? Did I misread something?



##########
parquet-variant/src/decoder.rs:
##########
@@ -112,14 +218,222 @@ mod tests {
     #[test]
     fn test_i8() -> Result<(), ArrowError> {
         let value = [
-            3 << 2, // Primitive type for i8
-            42,
+            (VariantPrimitiveType::Int8 as u8) << 2, // Basic type
+            0x2a,                                    // Data
         ];
         let result = decode_int8(&value)?;
         assert_eq!(result, 42);
         Ok(())
     }
 
+    #[test]
+    fn test_i16() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int16 as u8) << 2, // Basic type
+            0xd2,
+            0x04, // Data
+        ];
+        let result = decode_int16(&value)?;
+        assert_eq!(result, 1234);
+        Ok(())
+    }
+
+    #[test]
+    fn test_i32() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int32 as u8) << 2, // Basic type
+            0x40,
+            0xe2,
+            0x01,
+            0x00, // Data
+        ];
+        let result = decode_int32(&value)?;
+        assert_eq!(result, 123456);
+        Ok(())

Review Comment:
   Well, _that_ turned into a rabbit hole: 
https://internals.rust-lang.org/t/misusing-v2-try-and-to-make-unit-tests-nicer/23079/3
   
   Feel free to ignore this whole distraction.



##########
parquet-variant/src/decoder.rs:
##########
@@ -90,6 +113,89 @@ pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, 
ArrowError> {
     Ok(value)
 }
 
+/// Decodes an Int16 from the value section of a variant.
+pub(crate) fn decode_int16(value: &[u8]) -> Result<i16, ArrowError> {
+    let value = i16::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int32 from the value section of a variant.
+pub(crate) fn decode_int32(value: &[u8]) -> Result<i32, ArrowError> {
+    let value = i32::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int64 from the value section of a variant.
+pub(crate) fn decode_int64(value: &[u8]) -> Result<i64, ArrowError> {
+    let value = i64::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes a Decimal4 from the value section of a variant.
+pub(crate) fn decode_decimal4(value: &[u8]) -> Result<(i32, u8), ArrowError> {
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i32::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Decimal8 from the value section of a variant.
+pub(crate) fn decode_decimal8(value: &[u8]) -> Result<(i64, u8), ArrowError> {
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i64::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Decimal16 from the value section of a variant.
+pub(crate) fn decode_decimal16(value: &[u8]) -> Result<(i128, u8), ArrowError> 
{
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i128::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Float from the value section of a variant.
+pub(crate) fn decode_float(value: &[u8]) -> Result<f32, ArrowError> {
+    let value = f32::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes a Date from the value section of a variant.
+pub(crate) fn decode_date(value: &[u8]) -> Result<NaiveDate, ArrowError> {
+    let days_since_epoch = i32::from_le_bytes(array_from_slice(value, 1)?);
+    let value = (DateTime::UNIX_EPOCH + Duration::days(days_since_epoch as 
i64)).date_naive();
+    Ok(value)
+}
+
+/// Decodes a TimestampMicros from the value section of a variant.
+pub(crate) fn decode_timestamp_micros(value: &[u8]) -> Result<DateTime<Utc>, 
ArrowError> {
+    let micros_since_epoch = i64::from_le_bytes(array_from_slice(value, 1)?);
+    if let Some(value) = DateTime::from_timestamp_micros(micros_since_epoch) {
+        Ok(value)
+    } else {
+        Err(ArrowError::CastError(format!(
+            "Could not cast `{micros_since_epoch}` microseconds into a 
DateTime<Utc>"
+        )))
+    }
+}
+
+/// Decodes a TimestampNTZMicros from the value section of a variant.
+pub(crate) fn decode_timestampntz_micros(value: &[u8]) -> 
Result<NaiveDateTime, ArrowError> {
+    let micros_since_epoch = i64::from_le_bytes(array_from_slice(value, 1)?);
+    if let Some(value) = DateTime::from_timestamp_micros(micros_since_epoch) {
+        Ok(value.naive_utc())
+    } else {
+        Err(ArrowError::CastError(format!(
+            "Could not cast `{micros_since_epoch}` microseconds into a 
NaiveDateTime"
+        )))
+    }

Review Comment:
   ```suggestion
       let value = decode_timestampntx_micros(value)?;
       Ok(value.naive_utc())
   ```



##########
parquet-variant/src/decoder.rs:
##########
@@ -90,6 +113,89 @@ pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, 
ArrowError> {
     Ok(value)
 }
 
+/// Decodes an Int16 from the value section of a variant.
+pub(crate) fn decode_int16(value: &[u8]) -> Result<i16, ArrowError> {
+    let value = i16::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int32 from the value section of a variant.
+pub(crate) fn decode_int32(value: &[u8]) -> Result<i32, ArrowError> {
+    let value = i32::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int64 from the value section of a variant.
+pub(crate) fn decode_int64(value: &[u8]) -> Result<i64, ArrowError> {
+    let value = i64::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes a Decimal4 from the value section of a variant.
+pub(crate) fn decode_decimal4(value: &[u8]) -> Result<(i32, u8), ArrowError> {
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i32::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Decimal8 from the value section of a variant.
+pub(crate) fn decode_decimal8(value: &[u8]) -> Result<(i64, u8), ArrowError> {
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i64::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Decimal16 from the value section of a variant.
+pub(crate) fn decode_decimal16(value: &[u8]) -> Result<(i128, u8), ArrowError> 
{
+    let scale = u8::from_le_bytes(array_from_slice(value, 1)?);
+    let integer = i128::from_le_bytes(array_from_slice(value, 2)?);
+    Ok((integer, scale))
+}
+
+/// Decodes a Float from the value section of a variant.
+pub(crate) fn decode_float(value: &[u8]) -> Result<f32, ArrowError> {
+    let value = f32::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes a Date from the value section of a variant.
+pub(crate) fn decode_date(value: &[u8]) -> Result<NaiveDate, ArrowError> {
+    let days_since_epoch = i32::from_le_bytes(array_from_slice(value, 1)?);
+    let value = (DateTime::UNIX_EPOCH + Duration::days(days_since_epoch as 
i64)).date_naive();
+    Ok(value)
+}
+
+/// Decodes a TimestampMicros from the value section of a variant.
+pub(crate) fn decode_timestamp_micros(value: &[u8]) -> Result<DateTime<Utc>, 
ArrowError> {
+    let micros_since_epoch = i64::from_le_bytes(array_from_slice(value, 1)?);
+    if let Some(value) = DateTime::from_timestamp_micros(micros_since_epoch) {
+        Ok(value)
+    } else {
+        Err(ArrowError::CastError(format!(
+            "Could not cast `{micros_since_epoch}` microseconds into a 
DateTime<Utc>"
+        )))
+    }
+}
+
+/// Decodes a TimestampNTZMicros from the value section of a variant.
+pub(crate) fn decode_timestampntz_micros(value: &[u8]) -> 
Result<NaiveDateTime, ArrowError> {
+    let micros_since_epoch = i64::from_le_bytes(array_from_slice(value, 1)?);
+    if let Some(value) = DateTime::from_timestamp_micros(micros_since_epoch) {
+        Ok(value.naive_utc())
+    } else {
+        Err(ArrowError::CastError(format!(
+            "Could not cast `{micros_since_epoch}` microseconds into a 
NaiveDateTime"
+        )))
+    }
+}
+
+/// Decodes a Binary from the value section of a variant.
+pub(crate) fn decode_binary(value: &[u8]) -> Result<&[u8], ArrowError> {
+    let len = u32::from_le_bytes(array_from_slice(value, 1)?) as usize;
+    let value = slice_from_slice(value, 5..5 + len)?;
+    Ok(value)

Review Comment:
   Surprised clippy didn't complain about this?
   ```suggestion
       slice_from_slice(value, 5..5 + len)
   ```



##########
parquet-variant/src/variant.rs:
##########
@@ -451,6 +486,37 @@ impl<'m, 'v> Variant<'m, 'v> {
         }
     }
 
+    pub fn as_naive_date(self) -> Option<NaiveDate> {
+        if let Variant::Date(d) = self {
+            Some(d)
+        } else {
+            None
+        }

Review Comment:
   A bit more idiomatic and compact to match instead of if let?
   ```suggestion
           match self {
               Variant::Date(d) => Some(d),
               _ => None,
           }
   ```
   (more below)



##########
parquet-variant/src/variant.rs:
##########
@@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> {
     pub fn as_int8(&self) -> Option<i8> {
         match *self {
             Variant::Int8(i) => Some(i),
-            // TODO: Add branches for type-widening/shortening when implemting 
rest of primitives for int
-            // Variant::Int16(i) => i.try_into().ok(),
-            // ...
+            Variant::Int16(i) => i.try_into().ok(),
+            Variant::Int32(i) => i.try_into().ok(),

Review Comment:
   I'm not sure what kind of casting we should automatically perform? Different 
callers could easily have different opinions and use cases. For example:
   * widening casts (e.g. 10i8 -> 10i32). Infallible.
   * lossless narrowing casts (e.g. 10i64 -> 10i32)
   * "mostly lossless" narrowing casts (e.g. decimal 10.000 -> 10i32). The 
numeric value doesn't change but precision information is lost (e.g. use cases 
that worry about significant digits treat `10`, `10.` and `10.000` very 
differently). 
   * lossy narrowing casts (e.g. 3.14f32 -> 3i32)
   * converting casts (e.g. "10" -> 10i32)
   * lossy converting casts (e.g. "3.14" -> 3i32)
   
   In spark, for example, I believe the standard cast operator includes all but 
the last case. Other systems might be strict and only allow widening casts, or 
widening plus lossless narrowing casts. How do we decide where to draw the 
line, given that there is no universally "correct" answer?



##########
parquet-variant/src/variant.rs:
##########
@@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> {
     pub fn as_int8(&self) -> Option<i8> {
         match *self {
             Variant::Int8(i) => Some(i),
-            // TODO: Add branches for type-widening/shortening when implemting 
rest of primitives for int
-            // Variant::Int16(i) => i.try_into().ok(),
-            // ...
+            Variant::Int16(i) => i.try_into().ok(),
+            Variant::Int32(i) => i.try_into().ok(),
+            Variant::Int64(i) => i.try_into().ok(),
+            _ => None,
+        }
+    }
+
+    pub fn as_int16(&self) -> Option<i16> {
+        match *self {
+            Variant::Int8(i) => Some(i.into()),
+            Variant::Int16(i) => Some(i),
+            Variant::Int32(i) => i.try_into().ok(),
+            Variant::Int64(i) => i.try_into().ok(),
+            _ => None,
+        }
+    }
+    pub fn as_int32(&self) -> Option<i32> {
+        match *self {
+            Variant::Int8(i) => Some(i.into()),
+            Variant::Int16(i) => Some(i.into()),
+            Variant::Int32(i) => Some(i),
+            Variant::Int64(i) => i.try_into().ok(),
+            _ => None,
+        }
+    }
+    pub fn as_int64(&self) -> Option<i64> {
+        match *self {
+            Variant::Int8(i) => Some(i.into()),
+            Variant::Int16(i) => Some(i.into()),
+            Variant::Int32(i) => Some(i.into()),
+            Variant::Int64(i) => Some(i),
+            _ => None,
+        }
+    }
+
+    pub fn as_decimal_int32(&self) -> Option<(i32, u8)> {

Review Comment:
   If we're trying to match the spec wording, should this be `as_decimal4`?



##########
parquet-variant/src/variant.rs:
##########
@@ -483,6 +632,57 @@ impl<'m, 'v> From<i8> for Variant<'m, 'v> {
     }
 }
 
+impl<'m, 'v> From<i16> for Variant<'m, 'v> {
+    fn from(value: i16) -> Self {
+        Variant::Int16(value)
+    }
+}
+
+impl<'m, 'v> From<i32> for Variant<'m, 'v> {
+    fn from(value: i32) -> Self {
+        Variant::Int32(value)
+    }
+}
+
+impl<'m, 'v> From<i64> for Variant<'m, 'v> {

Review Comment:
   Newer versions of clippy suggest:
   ```suggestion
   impl From<i64> for Variant<'_, '_> {
   ```
   (several of these cases)



##########
parquet-variant/src/variant.rs:
##########
@@ -402,11 +403,21 @@ pub enum Variant<'m, 'v> {
     // TODO: Add types for the rest of the primitive types, once API is agreed 
upon
     Null,
     Int8(i8),
-
+    Int16(i16),
+    Int32(i32),
+    Int64(i64),
+    Date(NaiveDate),
+    TimestampMicros(DateTime<Utc>),
+    TimestampNTZMicros(NaiveDateTime),
+    Decimal4 { integer: i32, scale: u8 },

Review Comment:
   The spec calls the integer form of a decimal an "unscaled value." 
   Should we try to capture that in our naming here?



##########
parquet-variant/src/variant.rs:
##########
@@ -423,9 +434,33 @@ impl<'m, 'v> Variant<'m, 'v> {
             VariantBasicType::Primitive => match get_primitive_type(header)? {

Review Comment:
   I had wondered the same thing... but there's also an argument to be made 
that the header is indeed part of the variant value. A missing first byte would 
be a problem if we ever needed to retrieve the full value.
   
   Maybe something to track as the implementation and usage of this API 
matures? I don't think we have enough information to make a confident decision 
quite yet?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to