Csaba Ringhofer created IMPALA-8077:
---------------------------------------
Summary: Avoid converting timestamps in dropped rows during
Parquet scanning
Key: IMPALA-8077
URL: https://issues.apache.org/jira/browse/IMPALA-8077
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Csaba Ringhofer
If flag convert_legacy_hive_parquet_utc_timestamps is true, then every
TIMESTAMP value is converted from UTC to local time during Parquet scanning.
This is done during column decoding, and Impala materializes every column
before calculating the WHERE predicate, so if a timestamp column is not in the
predicate, then the conversion is unnecessarily done in rows that fail the
predicate.
Example:
CREATE TABLE t (id INT, ts TIMESTAMP) STORED AS PARQUET;
SELECT * FROM t WHERE id = 1;
Timezone conversion will be done for every 'ts', even the predicate matches
only a single row (lets ignore stat and dictionary filtering). The CPU time of
the query above is likely to be dominated by timezone conversion, especially if
the query is very selective.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]