rtyler opened a new issue, #4075:
URL: https://github.com/apache/arrow-rs/issues/4075

   **Which part is this question about**
   
   I am using the parquet crate through delta-rs and trying to understand the 
disconnect between Delta's interpretation of `timestamp` and parquet. For 
example, [Delta considers timestamps as microseconds since 
epoch](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types)
   
   
   **Describe your question**
   
   The parquet format docs have a [dedicated timestamp 
type](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types)
 which I don't believe Delta is using. The parquet files written by 
[Delta](https://github.com/delta-io/delta) (the Spark implementation) write out 
an int96 type.
   
   The `parquet-tools` CLI shows the column type from a `.parquet` file as:
   
   ```
   ############ Column(timestamp) ############
   name: timestamp
   path: timestamp
   max_definition_level: 1
   max_repetition_level: 0
   physical_type: INT96
   logical_type: None
   converted_type (legacy): NONE
   compression: SNAPPY (space_saved: 13%)
   ```
   
   When I modify the `read_parquet.rs` example, the schema of `RecordBatch` 
coming from an example file with the above column is:
   
   ```
   Field { name: "timestamp", data_type: Timestamp(Nanosecond, None), nullable: 
true, dict_id: 0, dict_is_ordered: false, metadata: {} }
   ```
   
   I am assuming that the code which is doing this conversation on the INT96 
column to a timezone is in `consume_batch` within `primitive_array.rs` but I'm 
not entirely sure.
   
   
   I'm hoping for some help figuring out where the disconnect might be between 
how Delta Lake thinks "timestamp" should look (microseconds) versus the Parquet 
Rust reader which coerces that INT96 to nanoseconds.
   
   I'm trying to figure out 
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to