[ 
https://issues.apache.org/jira/browse/IMPALA-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-8077:
------------------------------------
    Description: 
If flag convert_legacy_hive_parquet_utc_timestamps is true, then every 
TIMESTAMP value is converted from UTC to local time during Parquet scanning. 
This is done during column decoding, and Impala materializes every column 
before calculating the WHERE predicate, so if a timestamp column is not in the 
predicate, then the conversion is unnecessarily done in rows that fail the 
predicate.

Example:
CREATE TABLE t (id INT, ts TIMESTAMP) STORED AS PARQUET;
SELECT * FROM t WHERE id = 1;

Timezone conversion will be done for every 'ts', even if the predicate matches 
only a single row (lets ignore stat and dictionary filtering). The CPU time of 
the query above is likely to be dominated by timezone conversion, especially if 
the query is very selective.

  was:
If flag convert_legacy_hive_parquet_utc_timestamps is true, then every 
TIMESTAMP value is converted from UTC to local time during Parquet scanning. 
This is done during column decoding, and Impala materializes every column 
before calculating the WHERE predicate, so if a timestamp column is not in the 
predicate, then the conversion is unnecessarily done in rows that fail the 
predicate.

Example:
CREATE TABLE t (id INT, ts TIMESTAMP) STORED AS PARQUET;
SELECT * FROM t WHERE id = 1;

Timezone conversion will be done for every 'ts', even the predicate matches 
only a single row (lets ignore stat and dictionary filtering). The CPU time of 
the query above is likely to be dominated by timezone conversion, especially if 
the query is very selective.


> Avoid converting timestamps in dropped rows during Parquet scanning
> -------------------------------------------------------------------
>
>                 Key: IMPALA-8077
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8077
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: parquet, performance, timestamp
>
> If flag convert_legacy_hive_parquet_utc_timestamps is true, then every 
> TIMESTAMP value is converted from UTC to local time during Parquet scanning. 
> This is done during column decoding, and Impala materializes every column 
> before calculating the WHERE predicate, so if a timestamp column is not in 
> the predicate, then the conversion is unnecessarily done in rows that fail 
> the predicate.
> Example:
> CREATE TABLE t (id INT, ts TIMESTAMP) STORED AS PARQUET;
> SELECT * FROM t WHERE id = 1;
> Timezone conversion will be done for every 'ts', even if the predicate 
> matches only a single row (lets ignore stat and dictionary filtering). The 
> CPU time of the query above is likely to be dominated by timezone conversion, 
> especially if the query is very selective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to