sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550881168



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => 
Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       I added an implementation using converters (I have not pushed it yet), 
but the creation of `ParquetTypeConverter` occurs before we even get to 
`build_for_primitive_type_inner`:
   
   ```
   ParquetFileArrowReader::get_record_reader ->
   ParquetFileArrowReader::get_record_reader_by_columns -> 
   ParquetFileArrowReader::get_schema ->
   schema::parquet_to_arrow_schema ->
   schema::parquet_to_arrow_schema_by_columns ->
   ParquetTypeConverter::new
   ```
   Thus, if we want to avoid the decimal case, a decimal specific 
implementation in `get_schema` would probably be required.
   
   The converters could avoid the decimal specific branch in 
PrimitiveArrayReader::next_batch:
   
   ```
   ArrowType::Decimal(p, s) => {
                   let to_int64 = arrow::compute::cast(&array, 
&ArrowType::Int64)?;
                   let mut builder = DecimalBuilder::new(to_int64.len(), *p, 
*s);
                   let values = 
to_int64.as_any().downcast_ref::<Int64Array>().unwrap();
                   for maybe_value in values.iter() {
                       match maybe_value {
                           Some(value) => builder.append_value(value as i128)?,
                           None => builder.append_null()?
                       }
                   }
                   Arc::new(builder.finish()) as ArrayRef
               }
   ```
   
   We could avoid the cast to i64 by adding converters for i32/i64 for but I 
think this is not such a big deal - what do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to