Michael Smith created IMPALA-13627:
--------------------------------------
Summary: Impala uses different timezone conversion when reading
Hive with legacy conversion
Key: IMPALA-13627
URL: https://issues.apache.org/jira/browse/IMPALA-13627
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 4.4.1
Reporter: Michael Smith
HIVE-12192, HIVE-20007 changed the way that timestamp computations are
performed and to some extend how timestamps are serialized and deserialized in
files (Parquet, Avro) by Hive. HIVE-25104 was added to allow Hive to continue
to write files using the legacy timestamp conversion so that older Hive
versions can read the correct time.
All of that is background to say that Impala does not mirror Hive's timestamp
conversion when reading INT96 coded timestamps that Hive converted from local
time to UTC using legacy timezone conversion. To reproduce
# Start Hive with TZ=Asia/Kuala_Lumpur
# Using beeline
{code}
create table test (d timestamp) stored as parquet;
set hive.parquet.timestamp.write.legacy.conversion.enabled=true;
insert into test values (cast("1900-01-01 00:00:00" as timestamp));
select * from test;
{code}
# Start Impala with TZ=Asia/Kuala_Lumpur
# Run {{impala-shell.sh -q 'select * from test'}}
In this particular example, Asia/Kuala_Lumpur will either map to (LMT) or (SGT)
depending on tzdata version. In either case, that time zone for 1900 differs
from the current UTC+8 timezone shift, so Impala shows a value that's off by ~1
hour.
The Parquet file Hive writes in this case contains
[key_value_metadata|https://parquet.apache.org/docs/file-format/metadata/] such
that Hive can identify what conversion to use when reading the data, so newer
Hive always handles these files correctly
{code}
writer.time.zone=Asia/Kuala_Lumpur
writer.model.name=3.1.3000.7.1.7.1000-141
writer.date.proleptic=false
writer.zone.conversion.legacy=true
{code}
Impala could support the same behavior to be compatible with Hive by
identifying the {{writer.zone.conversion.legacy}} flag and handing conversion
to a SimpleDateFormat in Java ([Hive
code|https://github.com/apache/hive/blob/rel/release-4.0.1/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L194-L201]).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)