Alexey Serbin created IMPALA-12370:
--------------------------------------

             Summary: Add an option to customize timezone when working with 
UNIXTIME_MICROS columns of Kudu tables
                 Key: IMPALA-12370
                 URL: https://issues.apache.org/jira/browse/IMPALA-12370
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Alexey Serbin


Impala uses the timezone of its server when converting Unix epoch time stored 
in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name 
TIMESTAMP) into a timestamp.  As one can see, the former does not contain 
information about timezone, but the latter does, and Impala's convention does 
make sense and works totally fine if the data is being written and read by 
Impala or by other application that uses the same convention.

However, Spark uses a different convention.  Spark applications convert 
timestamps to the UTC timezone before representing the result as Unix epoch 
time.  So, when a Spark application stores timestamp data in a Kudu table, 
there is a difference in the result timestamps upon reading the stored data via 
Impala if Impala servers are running in other than the UTC timezone.

As of now, the workaround is to run Impala servers in the UTC timezone, so the 
convention used by Spark produces the same result as the convention used by 
Impala when converting between timestamps and Unix epoch times.

In this context, it would be great to make it possible customizing the timezone 
that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP values stored 
in Kudu tables.  That will free the users from the inconvenience of running 
their clusters in the UTC timezone if they use a mix of Spark/Impala 
applications to work with the same data stored in Kudu tables.  Ideally, the 
setting should be per Kudu table, but a system-wide flag is also an option.

This is similar to IMPALA-1658.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to