[ 
https://issues.apache.org/jira/browse/IMPALA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851132#comment-17851132
 ] 

Csaba Ringhofer commented on IMPALA-12370:
------------------------------------------

>That will free the users from the inconvenience of running their clusters in 
>the UTC timezone
The timezone doesn't need to be set at server level in Impala, it can be set 
per query using query option "timezone", e.g. set timezone=CET;

> Ideally, the setting should be per Kudu table, but a system-wide flag is also 
> an option.
Query option convert_kudu_utc_timestamps, only affects reading, so there could 
be a writing related one to, e.g. write_kudu_utc_timestamps. (or 
convert_kudu_utc_timestamps could be changed to also affect writing).

I agree that the ideal would be to be able to override this per table, for 
example with a table property like "impala.use_kudu_utc_timestamps" which would 
override both convert_kudu_utc_timestamps / write_kudu_utc_timestamps.
It would be even better if other components would also respect this property, 
so if it is false, then they would write in the timezone agnostic "Impala" way. 

> Add an option to customize timezone when working with UNIXTIME_MICROS columns 
> of Kudu tables
> --------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12370
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12370
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Alexey Serbin
>            Priority: Major
>              Labels: timezone
>
> Impala uses the timezone of its server when converting Unix epoch time stored 
> in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name 
> TIMESTAMP) into a timestamp.  As one can see, the former (a values stored in 
> a column of the UNIXTIME_MICROS type) does not contain information about 
> timezone, but the latter (the result timestamp returned by Impala) does, and 
> Impala's convention does make sense and works totally fine if the data is 
> being written and read by Impala or by other application that uses the same 
> convention.
> However, Spark uses a different convention.  Spark applications convert 
> timestamps to the UTC timezone before representing the result as Unix epoch 
> time.  So, when a Spark application stores timestamp data in a Kudu table, 
> there is a difference in the result timestamps upon reading the stored data 
> via Impala if Impala servers are running in other than the UTC timezone.
> As of now, the workaround is to run Impala servers in the UTC timezone, so 
> the convention used by Spark produces the same result as the convention used 
> by Impala when converting between timestamps and Unix epoch times.
> In this context, it would be great to make it possible customizing the 
> timezone that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP 
> values stored in Kudu tables.  That will free the users from the 
> inconvenience of running their clusters in the UTC timezone if they use a mix 
> of Spark/Impala applications to work with the same data stored in Kudu 
> tables.  Ideally, the setting should be per Kudu table, but a system-wide 
> flag is also an option.
> This is similar to IMPALA-1658.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to