shardulm94 edited a comment on issue #778: ORC: Implement TestGenericData and 
fix reader and writer issues
URL: https://github.com/apache/incubator-iceberg/pull/778#issuecomment-600902410
 
 
   @rdblue The use case is a bit different here. The end user Java object is 
still LocalDateTime for timestamp and OffsetDateTime for timestamptz. The 
timestamptz implementation for ORC still uses OffsetDateTime internally and is 
consistent with Avro and Parquet. I perhaps should have made the comment more 
clearer.
   
   TL;DR
   For timestamptz, its the same across all 3 formats
   For timestamp,
   ```
   Avro & Parquet: LocalDateTime -> OffsetDateTime -> File -> OffsetDateTime -> 
LocalDateTime
   ORC:            LocalDateTime -> ZonedDateTime  -> File -> ZonedDateTime  -> 
LocalDateTime
   ```
   
   Details:
   This is in relation to timestamp. In Avro and Parquet we model LocalDateTime 
as a UTC OffsetDateTime and store it as int64 then reverse it on the read side 
to get OffsetDateTime and strip out the UTC zone to get back LocalDateTime. ORC 
already has support for Timestamp; check "timestamp with local time zone" in 
https://orc.apache.org/docs/types.html at the end. You basically provide ORC 
time since UTC epoch in the current writer timezone, ORC will store the time 
and writer timezone. During reads, ORC will compare the reader and writer 
timezone and apply the necessary offset to return time since UTC epoch in the 
reader timezone which would be equivalent to the local time on the writers side.
   
   So here, we use ZonedDateTime to convert LocalDateTime into the "writer 
timezone specific time" before passing to ORC. OffsetDateTime wont work here 
because the "writer timezone specific time" is sensitive to the DST rules 
within the timezone and so the offset may not be constant for all dates. On the 
read side we deserialize into a ZonedDateTime and then return LocalDateTime.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to