[ 
https://issues.apache.org/jira/browse/IMPALA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer closed IMPALA-7567.
-----------------------------------
    Resolution: Duplicate

Created by mistake.

> Implement timezone aware parquet stat filtering for timestamp columns
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-7567
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7567
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: parquet, timestamp
>
> Parquet timestamp columns can contain UTC normalized data, which means that 
> the data is stored in UTC but it is expected to be shown  in local time (to 
> be consistent with Hive). This is done by converting these timestamp from UTC 
> to local time during scanning.
> This conversion has to be considered during min/max stat filtering, otherwise 
> some row groups can be incorrectly skipped. For this reason IMPALA-7559 
> disables stat filtering on UTC normalized timestamp columns. 
> This ticket deals with creating a correct implementation to be able re-enable 
> stat filtering for these columns.
> DST and historical rule changes add some complexity to this. UTC->local 
> mapping can be non-monotonous, and  local->UTC mapping can be ambiguous. The 
> non-monotonous mapping means that if tMin <= t <= tMax is true in UTC does 
> not imply that the same is true in local time.
> The solution I see is to convert min/max of the predicate from local to UTC 
> and resolve ambiguity by  choosing the earlier time in case of min, and the 
> later time in case of max. These UTC values can be compared with stats safely.
> Note the timezone rules can be different in Hive and Impala (especially 
> historical ones), so we cannot ensure that Impala gives exactly the same 
> results as Hive. The goal is to ensure that Impala returns the same rows with 
> and without stat filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to