[ 
https://issues.apache.org/jira/browse/HIVE-29033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986569#comment-17986569
 ] 

Vlad Rozov commented on HIVE-29033:
-----------------------------------

[~zabetak] No, timestamps are persisted in ORC using local time zone (aka 
writer time zone). Whether it is persisted by Hive 2.x, Hive 3.x, Hive 4.x or 
Spark, the value persisted on the disk is the same and it is in local time 
zone. When ORC reads data from the disk, it uses ORC reader options to set 
{{TimestampColumnVector}} in UTC or local time zone and according to that sets 
{{TimestampColumnVector.isUTC}} flag. 

The ticket does not impact Hive users and I don't know Hive specific use case 
that show case the bug. As long as Hive *always* (in all possible code paths) 
sets {{ReaderOptions.useUTCTimestamp}} to {{true}} it is not impacted by the 
ticket. But, IMO, that is not the best practice to blindly assume that 
{{TimestampColumnVector.isUTC}} is always {{true}} or that 
{{TimestampColumnVector}} is always in UTC time zone. It is not the case in 
Spark that uses local time zone.

> ORC reader should not assume that TimestampColumnVector is in UTC time zone
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-29033
>                 URL: https://issues.apache.org/jira/browse/HIVE-29033
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, ORC
>    Affects Versions: 4.1.0
>            Reporter: Vlad Rozov
>            Assignee: Vlad Rozov
>            Priority: Major
>              Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to