sunchao commented on a change in pull request #9612:
URL: https://github.com/apache/arrow/pull/9612#discussion_r593855164



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -583,48 +626,109 @@ impl ParquetTypeConverter<'_> {
     }
 
     fn from_int32(&self) -> Result<DataType> {
-        match self.schema.get_basic_info().converted_type() {
-            ConvertedType::NONE => Ok(DataType::Int32),
-            ConvertedType::UINT_8 => Ok(DataType::UInt8),
-            ConvertedType::UINT_16 => Ok(DataType::UInt16),
-            ConvertedType::UINT_32 => Ok(DataType::UInt32),
-            ConvertedType::INT_8 => Ok(DataType::Int8),
-            ConvertedType::INT_16 => Ok(DataType::Int16),
-            ConvertedType::INT_32 => Ok(DataType::Int32),
-            ConvertedType::DATE => Ok(DataType::Date32),
-            ConvertedType::TIME_MILLIS => 
Ok(DataType::Time32(TimeUnit::Millisecond)),
-            ConvertedType::DECIMAL => Ok(self.to_decimal()),
-            other => Err(ArrowError(format!(
-                "Unable to convert parquet INT32 logical type {}",
-                other
+        match (
+            self.schema.get_basic_info().logical_type(),

Review comment:
       nit: perhaps we can first "merge" the logical and converted type into a 
logical type and then do the conversion, to avoid some of the code 
duplications. In the case when logical type is not present, we can always 
convert the converted type into a logical type while losing some information.
   
   We can do this as a follow-up though.

##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -673,6 +784,7 @@ impl ParquetTypeConverter<'_> {
     /// This function takes care of logical type and repetition.
     fn to_group_type(&self) -> Result<Option<DataType>> {
         if self.is_repeated() {
+            dbg!(self.schema.get_basic_info());

Review comment:
       nit: remove?

##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -364,32 +385,51 @@ fn arrow_to_parquet_type(field: &Field) -> Result<Type> {
         DataType::Float64 => Type::primitive_type_builder(name, 
PhysicalType::DOUBLE)
             .with_repetition(repetition)
             .build(),
-        DataType::Timestamp(time_unit, _) => {
+        DataType::Timestamp(time_unit, zone) => {
             Type::primitive_type_builder(name, PhysicalType::INT64)
-                .with_converted_type(match time_unit {
-                    TimeUnit::Second => ConvertedType::TIMESTAMP_MILLIS,
-                    TimeUnit::Millisecond => ConvertedType::TIMESTAMP_MILLIS,
-                    TimeUnit::Microsecond => ConvertedType::TIMESTAMP_MICROS,
-                    TimeUnit::Nanosecond => ConvertedType::TIMESTAMP_MICROS,
-                })
+                .with_logical_type(Some(LogicalType::TIMESTAMP(TimestampType {
+                    is_adjusted_to_u_t_c: matches!(zone, Some(z) if z.as_str() 
== "UTC"),

Review comment:
       Hmm this means we'll lose the timezone info right? as 
`is_adjusted_to_u_t_c` means using local timezone. 

##########
File path: rust/parquet/src/schema/types.rs
##########
@@ -419,6 +439,79 @@ impl<'a> PrimitiveTypeBuilder<'a> {
             precision: self.precision,
         })
     }
+
+    #[inline]
+    fn test_decimal_precision_scale(&self) -> Result<()> {

Review comment:
       nit: maybe rename `test` to `check` or something else. It's easy to 
misread this as a test case.

##########
File path: rust/parquet/src/schema/types.rs
##########
@@ -479,13 +572,17 @@ impl<'a> GroupTypeBuilder<'a> {
 
     /// Creates a new `GroupType` instance from the gathered attributes.
     pub fn build(self) -> Result<Type> {
-        let basic_info = BasicTypeInfo {
+        let mut basic_info = BasicTypeInfo {
             name: String::from(self.name),
             repetition: self.repetition,
             converted_type: self.converted_type,
-            logical_type: self.logical_type,
+            logical_type: self.logical_type.clone(),
             id: self.id,
         };
+        // Populate the converted type if only the logical type is populated

Review comment:
       we might need more tests for the case when logical type is set.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to