Csaba Ringhofer created IMPALA-7723:
---------------------------------------
Summary: Recognize int64 timestamps in CREATE TABLE LIKE PARQUET
Key: IMPALA-7723
URL: https://issues.apache.org/jira/browse/IMPALA-7723
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Csaba Ringhofer
IMPALA-5050 adds support for reading int64 encoded Parquet timestamps. These
columns have int64 physical type, and converted/logical types has to be used to
differentiate them from BIGINTs. These columns can be read both as BIGINTs and
TIMESTAMPs depending on the table's schema.
CREATE TABLE LIKE PARQUET could also convert these columns to TIMESTAMP instead
of BIGINT, but I decided to postpone adding this feature for two reasons:
1. It could break the following possible workflow:
- generate Parquet files (that contain int64 timestamps) with some tool
- use Impala's CREATE TABLE LIKE PARQUET + LOAD DATA to make it accessible as a
table
- run some queries that rely on interpreting these columns as integers
CAST (col as BIGINT) in the query would make this even worse, as it would
convert timestamp to unix time in seconds instead of micros/millis without any
warning.
2. Adding support for int64 timestamps with nanoseconds precision will need
Impala's parquet-hadoop-bundle dependency to be bumped to a new major version,
which may contain incompatible API changes.
Note that parquet-hadoop-bundle is only used in CREATE TABLE LIKE PARQUET. The
C++ parts of Impala only rely on parquet.thrift, which can be updated more
easily.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]