liukun4515 commented on issue #9981: URL: https://github.com/apache/arrow-datafusion/issues/9981#issuecomment-2053634693
> I didn't quite follow your description -- is the issue that your data is already in UTC-7 time (and thus should not be adjusted) but that DataFusion is adjusting the timezone anyways? @alamb Sorry for the bad description, I will share my parquet file in the next week. Our ETL engine(spark) write the parquet file to HDFS and as we all know the spark use the UTC/UNIX epoch/time. But int arrow-rs, when we meet the INT96 and will get the the arrow datatype of `DataType::Timestamp(TimeUnit::Nanosecond, None)` by this code https://github.com/apache/arrow-rs/blob/a999fb86764e9310bb4822c7e7c6551f247e0e0b/parquet/src/arrow/schema/primitive.rs#L99 In the definition of timestamp in the arrow data type https://github.com/apache/arrow/blob/main/format/Schema.fbs#L303, if the type of timestamp without the timezone value, it means we don't know the reference of the timestamp https://github.com/apache/arrow/blob/main/format/Schema.fbs#L318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
