alamb commented on code in PR #7644: URL: https://github.com/apache/arrow-rs/pull/7644#discussion_r2141007381
########## parquet-variant/src/decoder.rs: ########## @@ -33,7 +34,18 @@ pub enum VariantPrimitiveType { BooleanTrue = 1, BooleanFalse = 2, Int8 = 3, + Int16 = 4, + Int32 = 5, + Int64 = 6, + Decimal4 = 8, + Decimal8 = 9, + Decimal16 = 10, // TODO: Add types for the rest of primitives, once API is agreed upon Review Comment: I think we can remove this TODO as well ########## parquet-variant/src/decoder.rs: ########## @@ -64,7 +76,18 @@ impl TryFrom<u8> for VariantPrimitiveType { 1 => Ok(VariantPrimitiveType::BooleanTrue), 2 => Ok(VariantPrimitiveType::BooleanFalse), 3 => Ok(VariantPrimitiveType::Int8), + 4 => Ok(VariantPrimitiveType::Int16), + 5 => Ok(VariantPrimitiveType::Int32), + 6 => Ok(VariantPrimitiveType::Int64), // TODO: Add types for the rest, once API is agreed upon Review Comment: Likewise I think this TODO is no longer relevant ########## parquet-variant/tests/variant_interop.rs: ########## @@ -46,22 +47,22 @@ fn get_primitive_cases() -> Vec<(&'static str, Variant<'static, 'static>)> { // Cases are commented out // Enabling is tracked in https://github.com/apache/arrow-rs/issues/7630 vec![ - // ("primitive_binary", Variant::Binary), + ("primitive_binary", Variant::Binary(&[0x03, 0x13, 0x37, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe])), ("primitive_boolean_false", Variant::BooleanFalse), ("primitive_boolean_true", Variant::BooleanTrue), - // ("primitive_date", Variant::Null), - //("primitive_decimal4", Variant::Null), - //("primitive_decimal8", Variant::Null), - //("primitive_decimal16", Variant::Null), - //("primitive_float", Variant::Null), + ("primitive_date", Variant::Date(NaiveDate::from_ymd_opt(2025, 4 , 16).unwrap())), + ("primitive_decimal4", Variant::Decimal4{integer: 1234, scale: 2}), + ("primitive_decimal8", Variant::Decimal8{integer: 1234567890, scale: 2}), + ("primitive_decimal16", Variant::Decimal16{integer: 1234567891234567890, scale: 2}), + ("primitive_float", Variant::Float(1234567890.1234)), ("primitive_int8", Variant::Int8(42)), - //("primitive_int16", Variant::Null), - //("primitive_int32", Variant::Null), - //("primitive_int64", Variant::Null), + ("primitive_int16", Variant::Int16(1234)), + ("primitive_int32", Variant::Int32(123456)), + ("primitive_int64", Variant::Int64(1234567890123456789)), ("primitive_null", Variant::Null), ("primitive_string", Variant::String("This string is longer than 64 bytes and therefore does not fit in a short_string and it also includes several non ascii characters such as 🐢, 💖, ♥\u{fe0f}, 🎣 and 🤦!!")), - //("primitive_timestamp", Variant::Null), - //("primitive_timestampntz", Variant::Null), + ("primitive_timestamp", Variant::TimestampMicros(NaiveDate::from_ymd_opt(2025, 4, 16).unwrap().and_hms_milli_opt(16, 34, 56, 780).unwrap().and_utc())), Review Comment: I ran the regen script in the Americas/New_York timezone -- does that line up correctly? If so, then I think we (I) can make a PR to parquet-testing to clarify that in the repo ########## parquet-variant/src/variant.rs: ########## @@ -402,11 +403,21 @@ pub enum Variant<'m, 'v> { // TODO: Add types for the rest of the primitive types, once API is agreed upon Review Comment: Also, we can remove this TODO ########## parquet-variant/src/variant.rs: ########## @@ -451,6 +486,37 @@ impl<'m, 'v> Variant<'m, 'v> { } } + pub fn as_naive_date(self) -> Option<NaiveDate> { Review Comment: I think it would be really nice to add some doc comments to these methods even if it seems somewhat repetitive. We can do it in a follow on PR as well ########## parquet-variant/src/decoder.rs: ########## @@ -112,14 +218,222 @@ mod tests { #[test] fn test_i8() -> Result<(), ArrowError> { let value = [ - 3 << 2, // Primitive type for i8 - 42, + (VariantPrimitiveType::Int8 as u8) << 2, // Basic type + 0x2a, // Data ]; let result = decode_int8(&value)?; assert_eq!(result, 42); Ok(()) } + #[test] + fn test_i16() -> Result<(), ArrowError> { + let value = [ + (VariantPrimitiveType::Int16 as u8) << 2, // Basic type + 0xd2, + 0x04, // Data + ]; + let result = decode_int16(&value)?; + assert_eq!(result, 1234); + Ok(()) + } + + #[test] + fn test_i32() -> Result<(), ArrowError> { + let value = [ + (VariantPrimitiveType::Int32 as u8) << 2, // Basic type + 0x40, + 0xe2, + 0x01, + 0x00, // Data + ]; + let result = decode_int32(&value)?; + assert_eq!(result, 123456); + Ok(()) + } + + #[test] + fn test_i64() -> Result<(), ArrowError> { + let value = [ + (VariantPrimitiveType::Int64 as u8) << 2, // Basic type + 0x15, + 0x81, + 0xe9, + 0x7d, + 0xf4, + 0x10, + 0x22, + 0x11, // Data + ]; + let result = decode_int64(&value)?; + assert_eq!(result, 1234567890123456789); + Ok(()) + } + + #[test] + fn test_decimal4() -> Result<(), ArrowError> { Review Comment: Filed https://github.com/apache/arrow-rs/issues/7645 ########## parquet-variant/src/variant.rs: ########## @@ -461,9 +527,92 @@ impl<'m, 'v> Variant<'m, 'v> { pub fn as_int8(&self) -> Option<i8> { match *self { Variant::Int8(i) => Some(i), - // TODO: Add branches for type-widening/shortening when implemting rest of primitives for int - // Variant::Int16(i) => i.try_into().ok(), - // ... + Variant::Int16(i) => i.try_into().ok(), + Variant::Int32(i) => i.try_into().ok(), + Variant::Int64(i) => i.try_into().ok(), + _ => None, + } + } + + pub fn as_int16(&self) -> Option<i16> { Review Comment: Yes, I think it would be great to add tests What I recommend doing is adding doc examples that will: 1. Make it easier to explain what is happening 2. Run as part of CI Something like ```rust let variant_i32 = Variant::from(1337i32); // you can read an int32 value as int16 assert_eq!(variant_i32.as_i16(), Some(Variant::from(1337i16)); // but not as an int8 assert_eq(variant_i32.as_i8(), None) ``` If you have time to do it in this PR that would be great, otherwise a follow on would be wonderful ########## parquet-variant/src/decoder.rs: ########## @@ -90,6 +113,89 @@ pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, ArrowError> { Ok(value) } +/// Decodes an Int16 from the value section of a variant. +pub(crate) fn decode_int16(value: &[u8]) -> Result<i16, ArrowError> { + let value = i16::from_le_bytes(array_from_slice(value, 1)?); + Ok(value) +} + +/// Decodes an Int32 from the value section of a variant. +pub(crate) fn decode_int32(value: &[u8]) -> Result<i32, ArrowError> { + let value = i32::from_le_bytes(array_from_slice(value, 1)?); Review Comment: I had to double check the reason for this constant `1` a few times. I found it pretty confusing that it is an offset. We can always find a better way to structure this code in subsequent PRs, but it might make sense to slice the input `value` so this code always looks like the following (no offset) -- and eventually remove the offset parameter all together ```rust let value = i32::from_le_bytes(array_from_slice(value, 0)?); ``` That being said since it is all crate private, I think this is totally good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org