pacman82 commented on issue #2984:
URL: https://github.com/apache/arrow-rs/issues/2984#issuecomment-1299672225
Great test. I am very sorry. I should have been much clearer how to
reproduce the symptom I am seeing. I did take your test, but modified it to
write it into an acutal file called `tmp.par`.
```rust
std::fs::write("tmp.par", data).unwrap();
// let bytes = bytes::Bytes::from(data);
// let reader = SerializedFileReader::new(bytes).unwrap();
// assert_eq!(reader.metadata().file_metadata().schema(), schema.as_ref());
// assert_eq!(
// reader.metadata().file_metadata().schema().get_fields()[0]
// .get_basic_info()
// .converted_type(),
// ConvertedType::TIMESTAMP_MICROS
// );
```
Now if insepecting the file with parquet tools:
```shell
pip install parquet-tools
parquet-tools inspect tmp.par
```
It yields
```
serialized_size: 143
############ Columns ############
col1
############ Column(col1) ############
name: col1
path: col1
max_definition_level: 0
max_repetition_level: 0
physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=false, timeUnit=microseconds,
is_from_converted_type=false, force_set_converted_type=false)
converted_type (legacy): NONE
compression: UNCOMPRESSED (space_saved: 0%)
```
It is very interessting to see the parquet reader implementations disagree
here. In context the issue occurred that timestamps outputed by `odbc2parquet`
can not interpreted by Azure Data Lake anymore, since migrating to logical
types.
Thanks for your help so far, and sorry for not being clearer in the
beginning. I typed some of these into a phone, which made me error on the site
of briefety.
Best, Markus
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]