alamb commented on code in PR #7644:
URL: https://github.com/apache/arrow-rs/pull/7644#discussion_r2141007381


##########
parquet-variant/src/decoder.rs:
##########
@@ -33,7 +34,18 @@ pub enum VariantPrimitiveType {
     BooleanTrue = 1,
     BooleanFalse = 2,
     Int8 = 3,
+    Int16 = 4,
+    Int32 = 5,
+    Int64 = 6,
+    Decimal4 = 8,
+    Decimal8 = 9,
+    Decimal16 = 10,
     // TODO: Add types for the rest of primitives, once API is agreed upon

Review Comment:
   I think we can remove this TODO as well



##########
parquet-variant/src/decoder.rs:
##########
@@ -64,7 +76,18 @@ impl TryFrom<u8> for VariantPrimitiveType {
             1 => Ok(VariantPrimitiveType::BooleanTrue),
             2 => Ok(VariantPrimitiveType::BooleanFalse),
             3 => Ok(VariantPrimitiveType::Int8),
+            4 => Ok(VariantPrimitiveType::Int16),
+            5 => Ok(VariantPrimitiveType::Int32),
+            6 => Ok(VariantPrimitiveType::Int64),
             // TODO: Add types for the rest, once API is agreed upon

Review Comment:
   Likewise I think this TODO is no longer relevant



##########
parquet-variant/tests/variant_interop.rs:
##########
@@ -46,22 +47,22 @@ fn get_primitive_cases() -> Vec<(&'static str, 
Variant<'static, 'static>)> {
     // Cases are commented out
     // Enabling is tracked in  https://github.com/apache/arrow-rs/issues/7630
     vec![
-        // ("primitive_binary", Variant::Binary),
+        ("primitive_binary", Variant::Binary(&[0x03, 0x13, 0x37, 0xde, 0xad, 
0xbe, 0xef, 0xca, 0xfe])),
         ("primitive_boolean_false", Variant::BooleanFalse),
         ("primitive_boolean_true", Variant::BooleanTrue),
-        // ("primitive_date", Variant::Null),
-        //("primitive_decimal4", Variant::Null),
-        //("primitive_decimal8", Variant::Null),
-        //("primitive_decimal16", Variant::Null),
-        //("primitive_float", Variant::Null),
+        ("primitive_date", Variant::Date(NaiveDate::from_ymd_opt(2025, 4 , 
16).unwrap())),
+        ("primitive_decimal4", Variant::Decimal4{integer: 1234, scale: 2}),
+        ("primitive_decimal8", Variant::Decimal8{integer: 1234567890, scale: 
2}),
+        ("primitive_decimal16", Variant::Decimal16{integer: 
1234567891234567890, scale: 2}),
+        ("primitive_float", Variant::Float(1234567890.1234)),
         ("primitive_int8", Variant::Int8(42)),
-        //("primitive_int16", Variant::Null),
-        //("primitive_int32", Variant::Null),
-        //("primitive_int64", Variant::Null),
+        ("primitive_int16", Variant::Int16(1234)),
+        ("primitive_int32", Variant::Int32(123456)),
+        ("primitive_int64", Variant::Int64(1234567890123456789)),
         ("primitive_null", Variant::Null),
         ("primitive_string", Variant::String("This string is longer than 64 
bytes and therefore does not fit in a short_string and it also includes several 
non ascii characters such as 🐢, 💖, ♥\u{fe0f}, 🎣 and 🤦!!")),
-        //("primitive_timestamp", Variant::Null),
-        //("primitive_timestampntz", Variant::Null),
+        ("primitive_timestamp", 
Variant::TimestampMicros(NaiveDate::from_ymd_opt(2025, 4, 
16).unwrap().and_hms_milli_opt(16, 34, 56, 780).unwrap().and_utc())),

Review Comment:
   I ran the regen script in the Americas/New_York timezone -- does that line 
up correctly?
   
   If so, then I think we (I) can make a PR to parquet-testing to clarify that 
in the repo



##########
parquet-variant/src/variant.rs:
##########
@@ -402,11 +403,21 @@ pub enum Variant<'m, 'v> {
     // TODO: Add types for the rest of the primitive types, once API is agreed 
upon

Review Comment:
   Also, we can remove this TODO



##########
parquet-variant/src/variant.rs:
##########
@@ -451,6 +486,37 @@ impl<'m, 'v> Variant<'m, 'v> {
         }
     }
 
+    pub fn as_naive_date(self) -> Option<NaiveDate> {

Review Comment:
   I think it would be really nice to add some doc comments to these methods 
even if it seems somewhat repetitive. 
   
   We can do it in a follow on PR as well



##########
parquet-variant/src/decoder.rs:
##########
@@ -112,14 +218,222 @@ mod tests {
     #[test]
     fn test_i8() -> Result<(), ArrowError> {
         let value = [
-            3 << 2, // Primitive type for i8
-            42,
+            (VariantPrimitiveType::Int8 as u8) << 2, // Basic type
+            0x2a,                                    // Data
         ];
         let result = decode_int8(&value)?;
         assert_eq!(result, 42);
         Ok(())
     }
 
+    #[test]
+    fn test_i16() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int16 as u8) << 2, // Basic type
+            0xd2,
+            0x04, // Data
+        ];
+        let result = decode_int16(&value)?;
+        assert_eq!(result, 1234);
+        Ok(())
+    }
+
+    #[test]
+    fn test_i32() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int32 as u8) << 2, // Basic type
+            0x40,
+            0xe2,
+            0x01,
+            0x00, // Data
+        ];
+        let result = decode_int32(&value)?;
+        assert_eq!(result, 123456);
+        Ok(())
+    }
+
+    #[test]
+    fn test_i64() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int64 as u8) << 2, // Basic type
+            0x15,
+            0x81,
+            0xe9,
+            0x7d,
+            0xf4,
+            0x10,
+            0x22,
+            0x11, // Data
+        ];
+        let result = decode_int64(&value)?;
+        assert_eq!(result, 1234567890123456789);
+        Ok(())
+    }
+
+    #[test]
+    fn test_decimal4() -> Result<(), ArrowError> {

Review Comment:
   Filed https://github.com/apache/arrow-rs/issues/7645



##########
parquet-variant/src/variant.rs:
##########
@@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> {
     pub fn as_int8(&self) -> Option<i8> {
         match *self {
             Variant::Int8(i) => Some(i),
-            // TODO: Add branches for type-widening/shortening when implemting 
rest of primitives for int
-            // Variant::Int16(i) => i.try_into().ok(),
-            // ...
+            Variant::Int16(i) => i.try_into().ok(),
+            Variant::Int32(i) => i.try_into().ok(),
+            Variant::Int64(i) => i.try_into().ok(),
+            _ => None,
+        }
+    }
+
+    pub fn as_int16(&self) -> Option<i16> {

Review Comment:
   Yes, I think it would be great to add tests
   
   What I recommend doing is adding doc examples that will:
   1. Make it easier to explain what is happening
   2. Run as part of CI
   
   Something like
   
   ```rust
   let variant_i32 = Variant::from(1337i32);
   // you can read an int32 value as int16
   assert_eq!(variant_i32.as_i16(), Some(Variant::from(1337i16));
   // but not as an int8
   assert_eq(variant_i32.as_i8(), None)
   ```
   
   If you have time to do it in this PR that would be great, otherwise a follow 
on would be wonderful



##########
parquet-variant/src/decoder.rs:
##########
@@ -90,6 +113,89 @@ pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, 
ArrowError> {
     Ok(value)
 }
 
+/// Decodes an Int16 from the value section of a variant.
+pub(crate) fn decode_int16(value: &[u8]) -> Result<i16, ArrowError> {
+    let value = i16::from_le_bytes(array_from_slice(value, 1)?);
+    Ok(value)
+}
+
+/// Decodes an Int32 from the value section of a variant.
+pub(crate) fn decode_int32(value: &[u8]) -> Result<i32, ArrowError> {
+    let value = i32::from_le_bytes(array_from_slice(value, 1)?);

Review Comment:
   I had to double check the reason for this constant `1`  a few times. I found 
it pretty confusing that it is an offset.
   
   We can always find a better way to structure this code in subsequent PRs, 
but it might make sense to slice the input `value` so this code always looks 
like  the following (no offset) -- and eventually remove the offset parameter 
all together
   
   ```rust
       let value = i32::from_le_bytes(array_from_slice(value, 0)?);
   ```
   
   That being said since it is all crate private, I think this is totally good. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to