[GitHub] [arrow-rs] pacman82 opened a new issue, #3017: Converted type is None according to Parquet Tools then utilizing logical types

GitBox Fri, 04 Nov 2022 00:02:39 -0700


pacman82 opened a new issue, #3017:
URL: https://github.com/apache/arrow-rs/issues/3017


   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   This regards the output written by the `parquet` crate. Declaring a column 
to containt a timestamp of microseconds using a `LogicalType` causes the 
written file to **not** have a converted type. At least according to 
`parquet-tools`.
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   1. Write a file `tmp.par` with a single column with type Timestamp of 
Microseconds, using a logical type.
   
   ```rust
   use std::sync::Arc;
   
   use parquet::{
       basic::{LogicalType, Repetition, Type},
       data_type::Int64Type,
       file::{properties::WriterProperties, writer::SerializedFileWriter},
       format::{MicroSeconds, TimeUnit},
       schema::types,
   };
   
   fn main() {
       let mut data = Vec::with_capacity(1024);
       let logical_type = LogicalType::Timestamp {
           is_adjusted_to_u_t_c: false,
           unit: TimeUnit::MICROS(MicroSeconds {}),
       };
       let field = Arc::new(
           types::Type::primitive_type_builder("col1", Type::INT64)
               .with_logical_type(Some(logical_type))
               .with_repetition(Repetition::REQUIRED)
               .build()
               .unwrap(),
       );
       let schema = Arc::new(
           types::Type::group_type_builder("schema")
               .with_fields(&mut vec![field])
               .build()
               .unwrap(),
       );
   
       // Write data
       let props = Arc::new(WriterProperties::builder().build());
       let mut writer = SerializedFileWriter::new(&mut data, schema, 
props).unwrap();
       let mut row_group_writer = writer.next_row_group().unwrap();
       let mut column_writer = row_group_writer.next_column().unwrap().unwrap();
       column_writer
           .typed::<Int64Type>()
           .write_batch(&[1, 2, 3, 4], None, None)
           .unwrap();
       column_writer.close().unwrap();
       row_group_writer.close().unwrap();
       writer.close().unwrap();
   
       // Write file for inspection with parqute tools
       std::fs::write("tmp.par", data).unwrap();
   }
   ```
   
   2. Install `parquet-tools` in a virtual environment and inspect the file
   
   ```shell
   pip install parquet-tools==0.2.11
   parquet-tools inspect tmp.par
   ```
   
   The resulting output indicates no Converted type
   
   ```
   ############ file meta data ############
   created_by: parquet-rs version 26.0.0
   num_columns: 1
   num_rows: 4
   num_row_groups: 1
   format_version: 1.0
   serialized_size: 143
   
   
   ############ Columns ############
   col1
   
   ############ Column(col1) ############
   name: col1
   path: col1
   max_definition_level: 0
   max_repetition_level: 0
   physical_type: INT64
   logical_type: Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, 
is_from_converted_type=false, force_set_converted_type=false)
   converted_type (legacy): NONE
   compression: UNCOMPRESSED (space_saved: 0%)
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   I would have expected the converted type to show up in the Metainformation 
emitted by parquet-tools.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   Triggered by upstream `odbc2parquet` issue 
<https://github.com/pacman82/odbc2parquet/issues/284>. Azure can not seem to 
handle the output since migration to `LogicalType`.
   Previously misdiagnosed this to not set the converted type correctly in the 
schema information, this however does happen. See: #2984.
   
   Thanks any help is appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] pacman82 opened a new issue, #3017: Converted type is None according to Parquet Tools then utilizing logical types

Reply via email to