[
https://issues.apache.org/jira/browse/HIVE-29033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986569#comment-17986569
]
Vlad Rozov commented on HIVE-29033:
-----------------------------------
[~zabetak] No, timestamps are persisted in ORC using local time zone (aka
writer time zone). Whether it is persisted by Hive 2.x, Hive 3.x, Hive 4.x or
Spark, the value persisted on the disk is the same and it is in local time
zone. When ORC reads data from the disk, it uses ORC reader options to set
{{TimestampColumnVector}} in UTC or local time zone and according to that sets
{{TimestampColumnVector.isUTC}} flag.
The ticket does not impact Hive users and I don't know Hive specific use case
that show case the bug. As long as Hive *always* (in all possible code paths)
sets {{ReaderOptions.useUTCTimestamp}} to {{true}} it is not impacted by the
ticket. But, IMO, that is not the best practice to blindly assume that
{{TimestampColumnVector.isUTC}} is always {{true}} or that
{{TimestampColumnVector}} is always in UTC time zone. It is not the case in
Spark that uses local time zone.
> ORC reader should not assume that TimestampColumnVector is in UTC time zone
> ---------------------------------------------------------------------------
>
> Key: HIVE-29033
> URL: https://issues.apache.org/jira/browse/HIVE-29033
> Project: Hive
> Issue Type: Bug
> Components: Hive, ORC
> Affects Versions: 4.1.0
> Reporter: Vlad Rozov
> Assignee: Vlad Rozov
> Priority: Major
> Labels: pull-request-available
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)