Re: [I] get error value if timestamp represented by the INT96 in the parquet file [arrow-datafusion]

via GitHub Sat, 13 Apr 2024 05:36:24 -0700


liukun4515 commented on issue #9981:
URL: 
https://github.com/apache/arrow-datafusion/issues/9981#issuecomment-2053634693


   > I didn't quite follow your description -- is the issue that your data is 
already in UTC-7 time (and thus should not be adjusted) but that DataFusion is 
adjusting the timezone anyways?
   
   @alamb 
   
   Sorry for the bad description, I will share my parquet file in the next week.
   
   Our ETL engine(spark) write the parquet file to HDFS and as we all know the 
spark use the UTC/UNIX epoch/time. 
   
   But int arrow-rs, when we meet the INT96 and will get the the arrow datatype 
of `DataType::Timestamp(TimeUnit::Nanosecond, None)`  by this code 
https://github.com/apache/arrow-rs/blob/a999fb86764e9310bb4822c7e7c6551f247e0e0b/parquet/src/arrow/schema/primitive.rs#L99
   
   In the definition of timestamp in the arrow data type 
https://github.com/apache/arrow/blob/main/format/Schema.fbs#L303, if the type 
of timestamp without the timezone value, it means we don't know the reference 
of the timestamp 
https://github.com/apache/arrow/blob/main/format/Schema.fbs#L318
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] get error value if timestamp represented by the INT96 in the parquet file [arrow-datafusion]

Reply via email to