klion26 commented on code in PR #8516:
URL: https://github.com/apache/arrow-rs/pull/8516#discussion_r2404021331


##########
parquet-variant-compute/src/type_conversion.rs:
##########
@@ -38,12 +38,33 @@ pub(crate) trait PrimitiveFromVariant: ArrowPrimitiveType {
     fn from_variant(variant: &Variant<'_, '_>) -> Option<Self::Native>;
 }
 
+/// Extension trait for Arrow timestamp types that can extract their native 
value from a Variant
+/// We can't use [`PrimitiveFromVariant`] directly because we might need to 
use methods that
+/// are only available on [`ArrowTimestampType`] (such as with_timezone_opt)
+pub(crate) trait TimestampFromVariant: ArrowTimestampType {
+    fn from_variant(variant: &Variant<'_, '_>) -> Option<Self::Native>;
+}
+
 /// Macro to generate PrimitiveFromVariant implementations for Arrow primitive 
types
 macro_rules! impl_primitive_from_variant {
-    ($arrow_type:ty, $variant_method:ident) => {
+    ($arrow_type:ty, $variant_method:ident $(, $cast_fn:expr)?) => {
         impl PrimitiveFromVariant for $arrow_type {
             fn from_variant(variant: &Variant<'_, '_>) -> Option<Self::Native> 
{
-                variant.$variant_method()
+                let value = variant.$variant_method();
+                $( let value = value.map($cast_fn); )?
+                value
+            }
+        }
+    };
+    ($arrow_type:ty, $( $variant_type:pat => $variant_method:ident, 
$cast_fn:expr ),+ $(,)?) => {
+        impl TimestampFromVariant for $arrow_type {
+            fn from_variant(variant: &Variant<'_, '_>) -> Option<Self::Native> 
{
+                match variant {
+                    $(
+                        $variant_type => 
variant.$variant_method().map($cast_fn),
+                    )+
+                    _ => None
+                }

Review Comment:
   Ok, I'll try with this way.
   
   > But the current code also allows invalid conversions, such as interpreting 
an NTZ timestamp as UTC, because the current as_timestamp_xxx methods are too 
narrow of a waist and lose information.
   
   Does this mean the `as_timestamp_xx` itself or the end-to-end of the variant 
to arrow here? If it's the former, yes, it may be wrong (or maybe we can treat 
the return value as the *physically* stored value), if it's the latter,  we'll 
attach the timezone info when initializing the builder



##########
parquet-variant-compute/src/type_conversion.rs:
##########
@@ -60,6 +65,44 @@ impl_primitive_from_variant!(datatypes::UInt64Type, as_u64);
 impl_primitive_from_variant!(datatypes::Float16Type, as_f16);
 impl_primitive_from_variant!(datatypes::Float32Type, as_f32);
 impl_primitive_from_variant!(datatypes::Float64Type, as_f64);
+impl_primitive_from_variant!(
+    datatypes::Date32Type,
+    as_naive_date,
+    Date32Type::from_naive_date
+);
+
+pub(crate) trait TimestampFromVariant: ArrowTimestampType {
+    fn from_variant(variant: &Variant<'_, '_>) -> Option<Self::Native>;
+}
+
+macro_rules! impl_timestamp_from_variant {
+    ($timestamp_type:ty, {
+        $(($variant_pattern:pat, $conversion:expr)),+ $(,)?
+    }) => {
+        impl TimestampFromVariant for $timestamp_type {
+            fn from_variant(variant: &Variant<'_, '_>) -> Option<Self::Native> 
{
+                match variant {
+                    $(
+                        $variant_pattern => $conversion,
+                    )+
+                    _ => None,
+                }
+            }
+        }
+    };
+}
+
+impl_timestamp_from_variant!(TimestampMicrosecondType, {
+    (Variant::TimestampMicros(t), Some(t.timestamp_micros())),
+    (Variant::TimestampNtzMicros(t), Some(t.and_utc().timestamp_micros())),
+});
+
+impl_timestamp_from_variant!(TimestampNanosecondType, {
+    (Variant::TimestampMicros(t), Some(t.timestamp_micros()).map(|t| t * 
1000)),
+    (Variant::TimestampNtzMicros(t), 
Some(t.and_utc().timestamp_micros()).map(|t| t * 1000)),
+    (Variant::TimestampNanos(t), t.timestamp_nanos_opt()),
+    (Variant::TimestampNtzNanos(t), t.and_utc().timestamp_nanos_opt()),
+});

Review Comment:
   > We can "safely" convert a TZ type to an NTZ type
   
   No, maybe we can't do this, this will lead to the *wrong* result. The 
timestamp(the long value) for tz was calculated between the time with 
`1970-01-01 00:00:00 at +00:00`, and NTZ was calculated between the time with 
`1970-01-01 00:00:00 in the local timezone`. 
   
   > But arrow doesn't distinguish physically between TZ and NTZ
   
   IIUC, we don't need to distinguish these two when physically storing the 
value; they both are the timestamp between *now* and some time point 
(`1970-01-01 00:00:00 at +00:00 for TZ, and `1970-01-01 00:00:00 in the local 
timezone` for NTZ )
   
   > So maybe the correct approach will be to add 
Variant::as_timestamp[_ntz]_[micros|nanos] methods,
   
   Separate the tz and ntz version Variant::as_timestamp[_ntz]_[micro|nanos] 
that returns `DateTime<Utc>` and `NaiveDateTime` seems a better idea here.



##########
parquet-variant/src/variant.rs:
##########
@@ -561,6 +561,72 @@ impl<'m, 'v> Variant<'m, 'v> {
         }
     }
 
+    /// Converts this variant to a `i64` representing microseconds since the 
Unix epoch if possible.
+    /// This is useful when convert the variant to arrow types.
+    ///
+    /// Returns Some(i64) for [`Variant::TimestampMicros`] and 
[`Variant::TimestampNtzMicros`],
+    /// None for the other variant types.
+    ///
+    /// ```
+    /// use parquet_variant::Variant;
+    /// use chrono::NaiveDate;
+    ///
+    /// // you can extract an i64 from Variant::TimestampMicros
+    /// let datetime = NaiveDate::from_ymd_opt(2025, 10, 
03).unwrap().and_hms_milli_opt(12, 34, 56, 789).unwrap().and_utc();
+    /// let v1 = Variant::from(datetime);
+    /// assert_eq!(v1.as_timestamp_micros(), Some(1759494896789000));
+    ///
+    /// // or Variant::TimestampNtzMicros
+    /// let datetime_ntz = NaiveDate::from_ymd_opt(2025, 10, 
03).unwrap().and_hms_milli_opt(12, 34, 56, 789).unwrap();
+    /// let v2 = Variant::from(datetime_ntz);
+    /// assert_eq!(v1.as_timestamp_micros(), Some(1759494896789000));
+    ///
+    /// // but not from other variants
+    /// let datetime_nanos = NaiveDate::from_ymd_opt(2025, 10, 
03).unwrap().and_hms_nano_opt(12, 34, 56, 789123456).unwrap().and_utc();
+    /// let v3 = Variant::from(datetime_nanos);
+    /// assert_eq!(v3.as_timestamp_micros(), None);
+    /// ```
+    pub fn as_timestamp_micros(&self) -> Option<i64> {
+        match *self {
+            Variant::TimestampMicros(d) => Some(d.timestamp_micros()),
+            Variant::TimestampNtzMicros(d) => 
Some(d.and_utc().timestamp_micros()),

Review Comment:
   Not sure if I fully understand this right. If the `lossy` here means that we 
lost the timezone info, yes, it is. The `timestamp` here means the physically 
stored value(with type long) for the `NaiveDateTime` and `DateTime<Utc>`. If we 
return `Option<NaiveDateTime>/Option<DateTime<Utc>>` separate the ntz and tz 
versions is a better idea, but when the return value is `Option<i64>`(the 
underlying timestamp long value) then separate or not is the same?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to