[ 
https://issues.apache.org/jira/browse/IMPALA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12370.
--------------------------------------
    Fix Version/s: Impala 4.5.0
       Resolution: Fixed

> Add an option to customize timezone when working with UNIXTIME_MICROS columns 
> of Kudu tables
> --------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12370
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12370
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Alexey Serbin
>            Assignee: Csaba Ringhofer
>            Priority: Major
>              Labels: timezone
>             Fix For: Impala 4.5.0
>
>
> Impala uses the timezone of its server when converting Unix epoch time stored 
> in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name 
> TIMESTAMP) into a timestamp.  As one can see, the former (a values stored in 
> a column of the UNIXTIME_MICROS type) does not contain information about 
> timezone, but the latter (the result timestamp returned by Impala) does, and 
> Impala's convention does make sense and works totally fine if the data is 
> being written and read by Impala or by other application that uses the same 
> convention.
> However, Spark uses a different convention.  Spark applications convert 
> timestamps to the UTC timezone before representing the result as Unix epoch 
> time.  So, when a Spark application stores timestamp data in a Kudu table, 
> there is a difference in the result timestamps upon reading the stored data 
> via Impala if Impala servers are running in other than the UTC timezone.
> As of now, the workaround is to run Impala servers in the UTC timezone, so 
> the convention used by Spark produces the same result as the convention used 
> by Impala when converting between timestamps and Unix epoch times.
> In this context, it would be great to make it possible customizing the 
> timezone that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP 
> values stored in Kudu tables.  That will free the users from the 
> inconvenience of running their clusters in the UTC timezone if they use a mix 
> of Spark/Impala applications to work with the same data stored in Kudu 
> tables.  Ideally, the setting should be per Kudu table, but a system-wide 
> flag is also an option.
> This is similar to IMPALA-1658.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to