superserious-dev commented on code in PR #7644: URL: https://github.com/apache/arrow-rs/pull/7644#discussion_r2143407764
########## parquet-variant/tests/variant_interop.rs: ########## @@ -46,22 +47,22 @@ fn get_primitive_cases() -> Vec<(&'static str, Variant<'static, 'static>)> { // Cases are commented out // Enabling is tracked in https://github.com/apache/arrow-rs/issues/7630 vec![ - // ("primitive_binary", Variant::Binary), + ("primitive_binary", Variant::Binary(&[0x03, 0x13, 0x37, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe])), ("primitive_boolean_false", Variant::BooleanFalse), ("primitive_boolean_true", Variant::BooleanTrue), - // ("primitive_date", Variant::Null), - //("primitive_decimal4", Variant::Null), - //("primitive_decimal8", Variant::Null), - //("primitive_decimal16", Variant::Null), - //("primitive_float", Variant::Null), + ("primitive_date", Variant::Date(NaiveDate::from_ymd_opt(2025, 4 , 16).unwrap())), + ("primitive_decimal4", Variant::Decimal4{integer: 1234, scale: 2}), + ("primitive_decimal8", Variant::Decimal8{integer: 1234567890, scale: 2}), + ("primitive_decimal16", Variant::Decimal16{integer: 1234567891234567890, scale: 2}), + ("primitive_float", Variant::Float(1234567890.1234)), ("primitive_int8", Variant::Int8(42)), - //("primitive_int16", Variant::Null), - //("primitive_int32", Variant::Null), - //("primitive_int64", Variant::Null), + ("primitive_int16", Variant::Int16(1234)), + ("primitive_int32", Variant::Int32(123456)), + ("primitive_int64", Variant::Int64(1234567890123456789)), ("primitive_null", Variant::Null), ("primitive_string", Variant::String("This string is longer than 64 bytes and therefore does not fit in a short_string and it also includes several non ascii characters such as 🐢, 💖, ♥\u{fe0f}, 🎣 and 🤦!!")), - //("primitive_timestamp", Variant::Null), - //("primitive_timestampntz", Variant::Null), + ("primitive_timestamp", Variant::TimestampMicros(NaiveDate::from_ymd_opt(2025, 4, 16).unwrap().and_hms_milli_opt(16, 34, 56, 780).unwrap().and_utc())), Review Comment: Yes, this lines up perfectly: the Eastern timezone is 4 hours behind UTC and the naive input timestamp is being shifted to UTC. (The two timestamps are 4 hours apart.) ########## parquet-variant/src/variant.rs: ########## @@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> { pub fn as_int8(&self) -> Option<i8> { match *self { Variant::Int8(i) => Some(i), - // TODO: Add branches for type-widening/shortening when implemting rest of primitives for int - // Variant::Int16(i) => i.try_into().ok(), - // ... + Variant::Int16(i) => i.try_into().ok(), + Variant::Int32(i) => i.try_into().ok(), + Variant::Int64(i) => i.try_into().ok(), + _ => None, + } + } + + pub fn as_int16(&self) -> Option<i16> { Review Comment: Doctests are a great idea. I added them for all the methods. ########## parquet-variant/src/decoder.rs: ########## @@ -112,14 +218,222 @@ mod tests { #[test] fn test_i8() -> Result<(), ArrowError> { let value = [ - 3 << 2, // Primitive type for i8 - 42, + (VariantPrimitiveType::Int8 as u8) << 2, // Basic type + 0x2a, // Data ]; let result = decode_int8(&value)?; assert_eq!(result, 42); Ok(()) } + #[test] + fn test_i16() -> Result<(), ArrowError> { + let value = [ + (VariantPrimitiveType::Int16 as u8) << 2, // Basic type + 0xd2, + 0x04, // Data + ]; + let result = decode_int16(&value)?; + assert_eq!(result, 1234); Review Comment: I think it makes sense to have input be raw bytes and output be the cast rust datatype. For example with the `test_date` case, it asserts that the output is a `NaiveDate` rather than raw bytes. ########## parquet-variant/src/variant.rs: ########## @@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> { pub fn as_int8(&self) -> Option<i8> { match *self { Variant::Int8(i) => Some(i), - // TODO: Add branches for type-widening/shortening when implemting rest of primitives for int - // Variant::Int16(i) => i.try_into().ok(), - // ... + Variant::Int16(i) => i.try_into().ok(), + Variant::Int32(i) => i.try_into().ok(), + Variant::Int64(i) => i.try_into().ok(), + _ => None, + } + } + + pub fn as_int16(&self) -> Option<i16> { + match *self { + Variant::Int8(i) => Some(i.into()), + Variant::Int16(i) => Some(i), + Variant::Int32(i) => i.try_into().ok(), + Variant::Int64(i) => i.try_into().ok(), + _ => None, + } + } + pub fn as_int32(&self) -> Option<i32> { + match *self { + Variant::Int8(i) => Some(i.into()), + Variant::Int16(i) => Some(i.into()), + Variant::Int32(i) => Some(i), + Variant::Int64(i) => i.try_into().ok(), + _ => None, + } + } + pub fn as_int64(&self) -> Option<i64> { + match *self { + Variant::Int8(i) => Some(i.into()), + Variant::Int16(i) => Some(i.into()), + Variant::Int32(i) => Some(i.into()), + Variant::Int64(i) => Some(i), + _ => None, + } + } + + pub fn as_decimal_int32(&self) -> Option<(i32, u8)> { Review Comment: I wasn't super sure on naming this `as_decimal_int32` because it seems like the convention is `as_<rust-equivalent-type>`. I'm open to changing the name, although not sure if `as_decimal4` captures the intent of the method. ########## parquet-variant/src/decoder.rs: ########## @@ -33,7 +34,18 @@ pub enum VariantPrimitiveType { BooleanTrue = 1, BooleanFalse = 2, Int8 = 3, + Int16 = 4, + Int32 = 5, + Int64 = 6, + Decimal4 = 8, + Decimal8 = 9, + Decimal16 = 10, // TODO: Add types for the rest of primitives, once API is agreed upon + Date = 11, + TimestampMicros = 12, + TimestampNTZMicros = 13, + Float = 14, + Binary = 15, String = 16, } Review Comment: Based on this [comment](https://github.com/apache/parquet-testing/blob/b68bea40fed8d1a780a9e09dd2262017e04b19ad/variant/regen.py#L78-L83), these don't appear to be ready to add yet. ########## parquet-variant/src/variant.rs: ########## @@ -402,11 +403,21 @@ pub enum Variant<'m, 'v> { // TODO: Add types for the rest of the primitive types, once API is agreed upon Null, Int8(i8), - + Int16(i16), + Int32(i32), + Int64(i64), + Date(NaiveDate), + TimestampMicros(DateTime<Utc>), + TimestampNTZMicros(NaiveDateTime), + Decimal4 { integer: i32, scale: u8 }, + Decimal8 { integer: i64, scale: u8 }, + Decimal16 { integer: i128, scale: u8 }, Review Comment: Definitely something to consider. Perhaps the naming of the fields and potential new struct(s) could be refined in a follow-up PR? ########## parquet-variant/src/variant.rs: ########## @@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> { pub fn as_int8(&self) -> Option<i8> { match *self { Variant::Int8(i) => Some(i), - // TODO: Add branches for type-widening/shortening when implemting rest of primitives for int - // Variant::Int16(i) => i.try_into().ok(), - // ... + Variant::Int16(i) => i.try_into().ok(), + Variant::Int32(i) => i.try_into().ok(), Review Comment: Yes, I agree that this could use some more thought to determine where the line should be. For now, I added `widening`, `lossless narrowing`, and `mostly lossless` casts. Perhaps these can be refined in a follow-up PR based on the outcome of the discussions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org