Stamatis Zampetakis created HIVE-27948:
------------------------------------------
Summary: Wrong results when using materialized views with
non-deterministic/dynamic functions
Key: HIVE-27948
URL: https://issues.apache.org/jira/browse/HIVE-27948
Project: Hive
Issue Type: Bug
Components: CBO
Affects Versions: 4.0.0-beta-1
Reporter: Stamatis Zampetakis
There are certain SQL functions that return different results across different
executions. Usually we refer to these functions as non-deterministic or
dynamic. Some examples are: UNIX_TIMESTAMP(), CURRENT_TIMESTAMP, CURRENT_DATE,
etc.
When a materialized view definition contains such functions the queries that
are using this view may return wrong results.
Consider the following scenario where we populate the employee table with
timestamps representing the future. For making this easily reproable in
self-contained test the timestamps are only a few seconds apart.
{code:sql}
CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC
TBLPROPERTIES ('transactional'='true');
INSERT INTO EMPS
VALUES ('Victor', UNIX_TIMESTAMP()),
('Alex', UNIX_TIMESTAMP() + 2),
('Bob', UNIX_TIMESTAMP() + 5),
('Alice', UNIX_TIMESTAMP() + 10);
CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <=
UNIX_TIMESTAMP();
{code}
When the materialized view is created it is populated with only the rows that
match the timestamp at the given time.
To demonstrate the problem run the following queries with view based rewritting
disabled and enabled.
{code:sql}
set hive.materializedview.rewriting.sql=false;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor 1702302786
Alex 1702302788
Bob 1702302791
{noformat}
{code:sql}
set hive.materializedview.rewriting.sql=true;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor 1702302786
Alex 1702302788
{noformat}
Naturally the second query should return more rows than the first one since
UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is in
use the second query will use the results from the materialized view which are
by now obsolete (missing Bob entry).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)