superserious-dev commented on code in PR #7644:
URL: https://github.com/apache/arrow-rs/pull/7644#discussion_r2143407764


##########
parquet-variant/tests/variant_interop.rs:
##########
@@ -46,22 +47,22 @@ fn get_primitive_cases() -> Vec<(&'static str, 
Variant<'static, 'static>)> {
     // Cases are commented out
     // Enabling is tracked in  https://github.com/apache/arrow-rs/issues/7630
     vec![
-        // ("primitive_binary", Variant::Binary),
+        ("primitive_binary", Variant::Binary(&[0x03, 0x13, 0x37, 0xde, 0xad, 
0xbe, 0xef, 0xca, 0xfe])),
         ("primitive_boolean_false", Variant::BooleanFalse),
         ("primitive_boolean_true", Variant::BooleanTrue),
-        // ("primitive_date", Variant::Null),
-        //("primitive_decimal4", Variant::Null),
-        //("primitive_decimal8", Variant::Null),
-        //("primitive_decimal16", Variant::Null),
-        //("primitive_float", Variant::Null),
+        ("primitive_date", Variant::Date(NaiveDate::from_ymd_opt(2025, 4 , 
16).unwrap())),
+        ("primitive_decimal4", Variant::Decimal4{integer: 1234, scale: 2}),
+        ("primitive_decimal8", Variant::Decimal8{integer: 1234567890, scale: 
2}),
+        ("primitive_decimal16", Variant::Decimal16{integer: 
1234567891234567890, scale: 2}),
+        ("primitive_float", Variant::Float(1234567890.1234)),
         ("primitive_int8", Variant::Int8(42)),
-        //("primitive_int16", Variant::Null),
-        //("primitive_int32", Variant::Null),
-        //("primitive_int64", Variant::Null),
+        ("primitive_int16", Variant::Int16(1234)),
+        ("primitive_int32", Variant::Int32(123456)),
+        ("primitive_int64", Variant::Int64(1234567890123456789)),
         ("primitive_null", Variant::Null),
         ("primitive_string", Variant::String("This string is longer than 64 
bytes and therefore does not fit in a short_string and it also includes several 
non ascii characters such as 🐢, 💖, ♥\u{fe0f}, 🎣 and 🤦!!")),
-        //("primitive_timestamp", Variant::Null),
-        //("primitive_timestampntz", Variant::Null),
+        ("primitive_timestamp", 
Variant::TimestampMicros(NaiveDate::from_ymd_opt(2025, 4, 
16).unwrap().and_hms_milli_opt(16, 34, 56, 780).unwrap().and_utc())),

Review Comment:
   Yes, this lines up perfectly: the Eastern timezone is 4 hours behind UTC and 
the naive input timestamp is being shifted to UTC.
   (The two timestamps are 4 hours apart.)



##########
parquet-variant/src/variant.rs:
##########
@@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> {
     pub fn as_int8(&self) -> Option<i8> {
         match *self {
             Variant::Int8(i) => Some(i),
-            // TODO: Add branches for type-widening/shortening when implemting 
rest of primitives for int
-            // Variant::Int16(i) => i.try_into().ok(),
-            // ...
+            Variant::Int16(i) => i.try_into().ok(),
+            Variant::Int32(i) => i.try_into().ok(),
+            Variant::Int64(i) => i.try_into().ok(),
+            _ => None,
+        }
+    }
+
+    pub fn as_int16(&self) -> Option<i16> {

Review Comment:
   Doctests are a great idea. I added them for all the methods.



##########
parquet-variant/src/decoder.rs:
##########
@@ -112,14 +218,222 @@ mod tests {
     #[test]
     fn test_i8() -> Result<(), ArrowError> {
         let value = [
-            3 << 2, // Primitive type for i8
-            42,
+            (VariantPrimitiveType::Int8 as u8) << 2, // Basic type
+            0x2a,                                    // Data
         ];
         let result = decode_int8(&value)?;
         assert_eq!(result, 42);
         Ok(())
     }
 
+    #[test]
+    fn test_i16() -> Result<(), ArrowError> {
+        let value = [
+            (VariantPrimitiveType::Int16 as u8) << 2, // Basic type
+            0xd2,
+            0x04, // Data
+        ];
+        let result = decode_int16(&value)?;
+        assert_eq!(result, 1234);

Review Comment:
   I think it makes sense to have input be raw bytes and output be the cast 
rust datatype. For example with the `test_date` case, it asserts that the 
output is a `NaiveDate` rather than raw bytes.



##########
parquet-variant/src/variant.rs:
##########
@@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> {
     pub fn as_int8(&self) -> Option<i8> {
         match *self {
             Variant::Int8(i) => Some(i),
-            // TODO: Add branches for type-widening/shortening when implemting 
rest of primitives for int
-            // Variant::Int16(i) => i.try_into().ok(),
-            // ...
+            Variant::Int16(i) => i.try_into().ok(),
+            Variant::Int32(i) => i.try_into().ok(),
+            Variant::Int64(i) => i.try_into().ok(),
+            _ => None,
+        }
+    }
+
+    pub fn as_int16(&self) -> Option<i16> {
+        match *self {
+            Variant::Int8(i) => Some(i.into()),
+            Variant::Int16(i) => Some(i),
+            Variant::Int32(i) => i.try_into().ok(),
+            Variant::Int64(i) => i.try_into().ok(),
+            _ => None,
+        }
+    }
+    pub fn as_int32(&self) -> Option<i32> {
+        match *self {
+            Variant::Int8(i) => Some(i.into()),
+            Variant::Int16(i) => Some(i.into()),
+            Variant::Int32(i) => Some(i),
+            Variant::Int64(i) => i.try_into().ok(),
+            _ => None,
+        }
+    }
+    pub fn as_int64(&self) -> Option<i64> {
+        match *self {
+            Variant::Int8(i) => Some(i.into()),
+            Variant::Int16(i) => Some(i.into()),
+            Variant::Int32(i) => Some(i.into()),
+            Variant::Int64(i) => Some(i),
+            _ => None,
+        }
+    }
+
+    pub fn as_decimal_int32(&self) -> Option<(i32, u8)> {

Review Comment:
   I wasn't super sure on naming this `as_decimal_int32` because it seems like 
the convention is `as_<rust-equivalent-type>`. I'm open to changing the name, 
although not sure if `as_decimal4` captures the intent of the method.



##########
parquet-variant/src/decoder.rs:
##########
@@ -33,7 +34,18 @@ pub enum VariantPrimitiveType {
     BooleanTrue = 1,
     BooleanFalse = 2,
     Int8 = 3,
+    Int16 = 4,
+    Int32 = 5,
+    Int64 = 6,
+    Decimal4 = 8,
+    Decimal8 = 9,
+    Decimal16 = 10,
     // TODO: Add types for the rest of primitives, once API is agreed upon
+    Date = 11,
+    TimestampMicros = 12,
+    TimestampNTZMicros = 13,
+    Float = 14,
+    Binary = 15,
     String = 16,
 }

Review Comment:
   Based on this 
[comment](https://github.com/apache/parquet-testing/blob/b68bea40fed8d1a780a9e09dd2262017e04b19ad/variant/regen.py#L78-L83),
 these don't appear to be ready to add yet.



##########
parquet-variant/src/variant.rs:
##########
@@ -402,11 +403,21 @@ pub enum Variant<'m, 'v> {
     // TODO: Add types for the rest of the primitive types, once API is agreed 
upon
     Null,
     Int8(i8),
-
+    Int16(i16),
+    Int32(i32),
+    Int64(i64),
+    Date(NaiveDate),
+    TimestampMicros(DateTime<Utc>),
+    TimestampNTZMicros(NaiveDateTime),
+    Decimal4 { integer: i32, scale: u8 },
+    Decimal8 { integer: i64, scale: u8 },
+    Decimal16 { integer: i128, scale: u8 },

Review Comment:
   Definitely something to consider. Perhaps the naming of the fields and 
potential new struct(s) could be refined in a follow-up PR?



##########
parquet-variant/src/variant.rs:
##########
@@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> {
     pub fn as_int8(&self) -> Option<i8> {
         match *self {
             Variant::Int8(i) => Some(i),
-            // TODO: Add branches for type-widening/shortening when implemting 
rest of primitives for int
-            // Variant::Int16(i) => i.try_into().ok(),
-            // ...
+            Variant::Int16(i) => i.try_into().ok(),
+            Variant::Int32(i) => i.try_into().ok(),

Review Comment:
   Yes, I agree that this could use some more thought to determine where the 
line should be. For now, I added `widening`, `lossless narrowing`, and `mostly 
lossless` casts. Perhaps these can be refined in a follow-up PR based on the 
outcome of the discussions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to