[jira] [Commented] (HIVE-29033) ORC reader should not assume that TimestampColumnVector is in UTC time zone

Vlad Rozov (Jira) Thu, 26 Jun 2025 11:15:05 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-29033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986435#comment-17986435
 ]


Vlad Rozov commented on HIVE-29033:
-----------------------------------

[~zabetak] Please check 
{{org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextTimestamp}}. The method 
converts {{TimestampColumnVector}} {{row}} to {{TimestampWritableV2}}. The 
conversion requires knowledge of the time zone of the 
{{TimestampColumnVector}}. Problem is that Hive blindly assumes that it is UTC 
time zone. And it works within Hive as Hive uses UTC for all of its operations 
and initializes {{org.apache.orc.impl.ReaderImpl}} using 
{{org.apache.hadoop.hive.ql.io.orc.OrcFile.ReaderOptions}} that always sets 
{{useUTCTimestamp}} to {{true}}. So far so good. Problem is when Spark tries to 
integrate with Hive 4.x (I am working on a PR that upgrades Spark dependency on 
Hive from 2.3.10 to 4.x). Both Spark and Hive 2.3.10 use local time zone, so 
that blind conversion in Hive 4.x causes incorrect results (regression) in 
Spark after upgrade. 

> ORC reader should not assume that TimestampColumnVector is in UTC time zone
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-29033
>                 URL: https://issues.apache.org/jira/browse/HIVE-29033
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, ORC
>    Affects Versions: 4.1.0
>            Reporter: Vlad Rozov
>            Assignee: Vlad Rozov
>            Priority: Major
>              Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-29033) ORC reader should not assume that TimestampColumnVector is in UTC time zone

Reply via email to