Jesus Camacho Rodriguez commented on HIVE-12192:

[~haozhun], the table is nice, thanks.

For 'Timestamp w local tz' and 'Timestamp w tz', I think you got the 
progression right.

However, 'timestamp' is slightly different.
The type is not really an instant, even before HIVE-12192. Semantics from 
querying perspective are similar to SQL timestamp (e.g. localdatetime). But 
internally, e.g., during optimization or execution, it is represented 
differently depending on the timezone of your system (it uses 
java.sql.timestamp class to store the value).
With an example, if you store '1970-01-01 00:00:00' in PST, you will get 
'1970-01-01 00:00:00' if you read it from IST. However, the internal 
representation will be different on the writer HS2 and the reader HS2.
How does ORC fix this? By recording the system timezone for timestamp type when 
the data is written, and then using that timezone and reader timezone to create 
the difference between both (and hence applying the displacement).

The goal of this issue is to make internal representation independent from 
system time zone. This would fix the issue described above in addition to other 
issues derived from this representation when Hive interacts with other 
projects, e.g., Calcite.

> Hive should carry out timestamp computations in UTC
> ---------------------------------------------------
>                 Key: HIVE-12192
>                 URL: https://issues.apache.org/jira/browse/HIVE-12192
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Ryan Blue
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>              Labels: timestamp
>         Attachments: HIVE-12192.patch
> Hive currently uses the "local" time of a java.sql.Timestamp to represent the 
> SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use 
> {{Timestamp#getYear()}} and similar methods to implement SQL functions like 
> {{year}}.
> When the SQL session's time zone is a DST zone, such as America/Los_Angeles 
> that alternates between PST and PDT, there are times that cannot be 
> represented because the effective zone skips them.
> {code}
> hive> select TIMESTAMP '2015-03-08 02:10:00.101';
> 2015-03-08 03:10:00.101
> {code}
> Using UTC instead of the SQL session time zone as the underlying zone for a 
> java.sql.Timestamp avoids this bug, while still returning correct values for 
> {{getYear}} etc. Using UTC as the convenience representation (timestamp 
> without time zone has no real zone) would make timestamp calculations more 
> consistent and avoid similar problems in the future.
> Notably, this would break the {{unix_timestamp}} UDF that specifies the 
> result is with respect to ["the default timezone and default 
> locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions].
>  That function would need to be updated to use the 
> {{System.getProperty("user.timezone")}} zone.

This message was sent by Atlassian JIRA

Reply via email to