[
https://issues.apache.org/jira/browse/IMPALA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851132#comment-17851132
]
Csaba Ringhofer commented on IMPALA-12370:
------------------------------------------
>That will free the users from the inconvenience of running their clusters in
>the UTC timezone
The timezone doesn't need to be set at server level in Impala, it can be set
per query using query option "timezone", e.g. set timezone=CET;
> Ideally, the setting should be per Kudu table, but a system-wide flag is also
> an option.
Query option convert_kudu_utc_timestamps, only affects reading, so there could
be a writing related one to, e.g. write_kudu_utc_timestamps. (or
convert_kudu_utc_timestamps could be changed to also affect writing).
I agree that the ideal would be to be able to override this per table, for
example with a table property like "impala.use_kudu_utc_timestamps" which would
override both convert_kudu_utc_timestamps / write_kudu_utc_timestamps.
It would be even better if other components would also respect this property,
so if it is false, then they would write in the timezone agnostic "Impala" way.
> Add an option to customize timezone when working with UNIXTIME_MICROS columns
> of Kudu tables
> --------------------------------------------------------------------------------------------
>
> Key: IMPALA-12370
> URL: https://issues.apache.org/jira/browse/IMPALA-12370
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Alexey Serbin
> Priority: Major
> Labels: timezone
>
> Impala uses the timezone of its server when converting Unix epoch time stored
> in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name
> TIMESTAMP) into a timestamp. As one can see, the former (a values stored in
> a column of the UNIXTIME_MICROS type) does not contain information about
> timezone, but the latter (the result timestamp returned by Impala) does, and
> Impala's convention does make sense and works totally fine if the data is
> being written and read by Impala or by other application that uses the same
> convention.
> However, Spark uses a different convention. Spark applications convert
> timestamps to the UTC timezone before representing the result as Unix epoch
> time. So, when a Spark application stores timestamp data in a Kudu table,
> there is a difference in the result timestamps upon reading the stored data
> via Impala if Impala servers are running in other than the UTC timezone.
> As of now, the workaround is to run Impala servers in the UTC timezone, so
> the convention used by Spark produces the same result as the convention used
> by Impala when converting between timestamps and Unix epoch times.
> In this context, it would be great to make it possible customizing the
> timezone that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP
> values stored in Kudu tables. That will free the users from the
> inconvenience of running their clusters in the UTC timezone if they use a mix
> of Spark/Impala applications to work with the same data stored in Kudu
> tables. Ideally, the setting should be per Kudu table, but a system-wide
> flag is also an option.
> This is similar to IMPALA-1658.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]