Alexey Serbin created IMPALA-12370:
--------------------------------------
Summary: Add an option to customize timezone when working with
UNIXTIME_MICROS columns of Kudu tables
Key: IMPALA-12370
URL: https://issues.apache.org/jira/browse/IMPALA-12370
Project: IMPALA
Issue Type: Improvement
Reporter: Alexey Serbin
Impala uses the timezone of its server when converting Unix epoch time stored
in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name
TIMESTAMP) into a timestamp. As one can see, the former does not contain
information about timezone, but the latter does, and Impala's convention does
make sense and works totally fine if the data is being written and read by
Impala or by other application that uses the same convention.
However, Spark uses a different convention. Spark applications convert
timestamps to the UTC timezone before representing the result as Unix epoch
time. So, when a Spark application stores timestamp data in a Kudu table,
there is a difference in the result timestamps upon reading the stored data via
Impala if Impala servers are running in other than the UTC timezone.
As of now, the workaround is to run Impala servers in the UTC timezone, so the
convention used by Spark produces the same result as the convention used by
Impala when converting between timestamps and Unix epoch times.
In this context, it would be great to make it possible customizing the timezone
that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP values stored
in Kudu tables. That will free the users from the inconvenience of running
their clusters in the UTC timezone if they use a mix of Spark/Impala
applications to work with the same data stored in Kudu tables. Ideally, the
setting should be per Kudu table, but a system-wide flag is also an option.
This is similar to IMPALA-1658.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)