rtyler opened a new issue, #4075: URL: https://github.com/apache/arrow-rs/issues/4075
**Which part is this question about** I am using the parquet crate through delta-rs and trying to understand the disconnect between Delta's interpretation of `timestamp` and parquet. For example, [Delta considers timestamps as microseconds since epoch](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types) **Describe your question** The parquet format docs have a [dedicated timestamp type](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types) which I don't believe Delta is using. The parquet files written by [Delta](https://github.com/delta-io/delta) (the Spark implementation) write out an int96 type. The `parquet-tools` CLI shows the column type from a `.parquet` file as: ``` ############ Column(timestamp) ############ name: timestamp path: timestamp max_definition_level: 1 max_repetition_level: 0 physical_type: INT96 logical_type: None converted_type (legacy): NONE compression: SNAPPY (space_saved: 13%) ``` When I modify the `read_parquet.rs` example, the schema of `RecordBatch` coming from an example file with the above column is: ``` Field { name: "timestamp", data_type: Timestamp(Nanosecond, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } ``` I am assuming that the code which is doing this conversation on the INT96 column to a timezone is in `consume_batch` within `primitive_array.rs` but I'm not entirely sure. I'm hoping for some help figuring out where the disconnect might be between how Delta Lake thinks "timestamp" should look (microseconds) versus the Parquet Rust reader which coerces that INT96 to nanoseconds. I'm trying to figure out **Additional context** <!-- Add any other context about the problem here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
