[
https://issues.apache.org/jira/browse/IMPALA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Csaba Ringhofer resolved IMPALA-12370.
--------------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
> Add an option to customize timezone when working with UNIXTIME_MICROS columns
> of Kudu tables
> --------------------------------------------------------------------------------------------
>
> Key: IMPALA-12370
> URL: https://issues.apache.org/jira/browse/IMPALA-12370
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Alexey Serbin
> Assignee: Csaba Ringhofer
> Priority: Major
> Labels: timezone
> Fix For: Impala 4.5.0
>
>
> Impala uses the timezone of its server when converting Unix epoch time stored
> in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name
> TIMESTAMP) into a timestamp. As one can see, the former (a values stored in
> a column of the UNIXTIME_MICROS type) does not contain information about
> timezone, but the latter (the result timestamp returned by Impala) does, and
> Impala's convention does make sense and works totally fine if the data is
> being written and read by Impala or by other application that uses the same
> convention.
> However, Spark uses a different convention. Spark applications convert
> timestamps to the UTC timezone before representing the result as Unix epoch
> time. So, when a Spark application stores timestamp data in a Kudu table,
> there is a difference in the result timestamps upon reading the stored data
> via Impala if Impala servers are running in other than the UTC timezone.
> As of now, the workaround is to run Impala servers in the UTC timezone, so
> the convention used by Spark produces the same result as the convention used
> by Impala when converting between timestamps and Unix epoch times.
> In this context, it would be great to make it possible customizing the
> timezone that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP
> values stored in Kudu tables. That will free the users from the
> inconvenience of running their clusters in the UTC timezone if they use a mix
> of Spark/Impala applications to work with the same data stored in Kudu
> tables. Ideally, the setting should be per Kudu table, but a system-wide
> flag is also an option.
> This is similar to IMPALA-1658.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)