[
https://issues.apache.org/jira/browse/IMPALA-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Csaba Ringhofer resolved IMPALA-7568.
-------------------------------------
Resolution: Implemented
Fix Version/s: Impala 3.2.0
> Implement timezone aware parquet stat filtering for timestamp columns
> ---------------------------------------------------------------------
>
> Key: IMPALA-7568
> URL: https://issues.apache.org/jira/browse/IMPALA-7568
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Major
> Labels: parquet, timestamp
> Fix For: Impala 3.2.0
>
>
> Parquet timestamp columns can contain UTC normalized data, which means that
> the data is stored in UTC but it is expected to be shown in local time (to
> be consistent with Hive). This is done by converting these timestamp from UTC
> to local time during scanning.
> This conversion has to be considered during min/max stat filtering, otherwise
> some row groups can be incorrectly skipped. For this reason IMPALA-7559
> disables stat filtering on UTC normalized timestamp columns.
> This ticket deals with creating a correct implementation to be able re-enable
> stat filtering for these columns.
> DST and historical rule changes add some complexity to this. UTC->local
> mapping can be non-monotonous, and local->UTC mapping can be ambiguous. The
> non-monotonous mapping means that if tMin <= t <= tMax is true in UTC does
> not imply that the same is true in local time.
> The solution I see is to convert min/max of the predicate from local to UTC
> and resolve ambiguity by choosing the earlier time in case of min, and the
> later time in case of max. These UTC values can be compared with stats safely.
> Note the timezone rules can be different in Hive and Impala (especially
> historical ones), so we cannot ensure that Impala gives exactly the same
> results as Hive. The goal is to ensure that Impala returns the same rows with
> and without stat filtering.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]