[ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27948:
----------------------------------
    Status: Patch Available  (was: Open)

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-27948
>                 URL: https://issues.apache.org/jira/browse/HIVE-27948
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 4.0.0-beta-1
>            Reporter: Stamatis Zampetakis
>            Assignee: Krisztian Kasa
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>        ('Alex', UNIX_TIMESTAMP() + 2),
>        ('Bob', UNIX_TIMESTAMP() + 5),
>        ('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor        1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor        1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to