[
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971840#comment-15971840
]
Ashutosh Chauhan commented on HIVE-16418:
-----------------------------------------
We need to think about storage type for Timestamp in different stages of query
processing:
* On-disk format : Whether to store TZ or not. Primary concern is fidelity of
original data and secondary concern is storage efficiency.
* In-memory format : On which computations are performed. As I see it, our
current Timestamp choice here is inappropriate. Issue is java.sql.Timestamp
(which implicitly assumes local Timezone) doesnt correspond to either sql
Timestamp (which is essentially zoneless ) or Timestamp with Timezone (which
has zone, but java.sql.Timestamp doesnt allow you to set). As I suggested
in-memory representation (i.e. on which all computations are performed) should
either directly use LocalTimeZone and ZonedTimeZone or model its behavior on
it.
* Serialization format: To transfer timestamp between different vertices. Here
primary concern is performance which comes if TZ is stored separately.
In light of above, I am ok with your proposal of using choice #2, but I think
you still need to think about in-memory format. Because apart from
to_utc_timestamp and related udfs implementing new type : Timestamp with Time
Zone with java.sql.Timestamp will be error-prone.
> Allow HiveKey to skip some bytes for comparison
> -----------------------------------------------
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
> Issue Type: New Feature
> Reporter: Rui Li
> Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent
> them from being used in comparison, e.g. HIVE-14412.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)