[GitHub] [arrow-rs] pacman82 commented on issue #2984: Implicity setting of converted type is missing then setting

GitBox Wed, 02 Nov 2022 00:03:36 -0700


pacman82 commented on issue #2984:
URL: https://github.com/apache/arrow-rs/issues/2984#issuecomment-1299672225


   Great test. I am very sorry. I should have been much clearer how to 
reproduce the symptom I am seeing. I did take your test, but modified it to 
write it into an acutal file called `tmp.par`.
   
   ```rust
   std::fs::write("tmp.par", data).unwrap();
   // let bytes = bytes::Bytes::from(data);
   // let reader = SerializedFileReader::new(bytes).unwrap();
   
   // assert_eq!(reader.metadata().file_metadata().schema(), schema.as_ref());
   // assert_eq!(
   //     reader.metadata().file_metadata().schema().get_fields()[0]
   //         .get_basic_info()
   //         .converted_type(),
   //     ConvertedType::TIMESTAMP_MICROS
   // );
   ```
   
   Now if insepecting the file with parquet tools:
   
   ```shell
   pip install parquet-tools
   parquet-tools inspect tmp.par
   ```
   
   It yields
   
   ```
   serialized_size: 143
   
   
   ############ Columns ############
   col1
   
   ############ Column(col1) ############
   name: col1
   path: col1
   max_definition_level: 0
   max_repetition_level: 0
   physical_type: INT64
   logical_type: Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, 
is_from_converted_type=false, force_set_converted_type=false)
   converted_type (legacy): NONE
   compression: UNCOMPRESSED (space_saved: 0%)
   ```
   
   It is very interessting to see the parquet reader implementations disagree 
here. In context the issue occurred that timestamps outputed by `odbc2parquet` 
can not interpreted by Azure Data Lake anymore, since migrating to logical 
types.
   
   Thanks for your help so far, and sorry for not being clearer in the 
beginning. I typed some of these into a phone, which made me error on the site 
of briefety.
   
   Best, Markus


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] pacman82 commented on issue #2984: Implicity setting of converted type is missing then setting

Reply via email to