[
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658226&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658226
]
ASF GitHub Bot logged work on HIVE-25546:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 30/Sep/21 09:52
Start Date: 30/Sep/21 09:52
Worklog Time Spent: 10m
Work Description: zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719238961
##########
File path:
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
Review comment:
Why do the properties below matter for the incremental rebuild?
```
hive.vectorized.execution.enabled=false
hive.strict.checks.cartesian.product=false
```
##########
File path:
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int)
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d
int) stored as orc TBLPROPERTIES ('transactional'='true',
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+ TBLPROPERTIES ('transactional'='true') AS
+ SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+ FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+ WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
Review comment:
Is it necessary?
##########
File path: ql/src/test/queries/clientpositive/materialized_view_parquet.q
##########
@@ -189,8 +189,10 @@ alter materialized view mv1_parquet_n2 rebuild;
alter materialized view mv1_parquet_n2 rebuild;
explain cbo
-select name from emps_parquet_n3 group by name;
+select name, sum(empid) from emps_parquet_n3 group by name;
Review comment:
Why are we changing existing queries? If it is useless then it's fine,
if we want to test additional stuff then we should add new queries instead of
modifying existing ones.
##########
File path:
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int)
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d
int) stored as orc TBLPROPERTIES ('transactional'='true',
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+ TBLPROPERTIES ('transactional'='true') AS
+ SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+ FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+ WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
Review comment:
I had the impression that view based rewritting takes place in CBO.
Isn't `EXPLAIN CBO` sufficient?
##########
File path:
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int)
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d
int) stored as orc TBLPROPERTIES ('transactional'='true',
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+ TBLPROPERTIES ('transactional'='true') AS
+ SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+ FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+ WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+-- NOW IT CAN BE USED AGAIN
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- NOW AN UPDATE
+UPDATE cmv_basetable_n6 SET a=2 WHERE a=1;
+
+-- INCREMENTAL REBUILD CAN BE TRIGGERED
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+-- MV CAN BE USED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+drop materialized view cmv_mat_view_n6;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a =
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
Review comment:
If the intention is to test that the view is not used after dropping it
then we should have `EXPLAIN CBO` here. Ensuring that results are correct when
the views are not used is not really necessary in my opinion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 658226)
Time Spent: 40m (was: 0.5h)
> Enable incremental rebuild of Materialized views with insert only source
> tables
> -------------------------------------------------------------------------------
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
> Issue Type: Improvement
> Components: CBO, Materialized views
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because
> it has an insert only source table (t1). Such tables does not have
> ROW_ID.write_id which is required to identify newly inserted records since
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)