[ 
https://issues.apache.org/jira/browse/HIVE-26375?focusedWorklogId=793591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793591
 ]

ASF GitHub Bot logged work on HIVE-26375:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Jul/22 08:20
            Start Date: 21/Jul/22 08:20
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on code in PR #3420:
URL: https://github.com/apache/hive/pull/3420#discussion_r926392298


##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestMaterializedViewRebuild.java:
##########
@@ -97,7 +91,7 @@ public void 
testWhenMajorCompactionThenIncrementalMVRebuildIsStillAvailable() th
     txnHandler.cleanTxnToWriteIdTable();
 
     List<String> result = execSelectAndDumpData("explain cbo alter 
materialized view " + MV1 + " rebuild", driver, "");
-    Assert.assertEquals(INCREMENTAL_REBUILD_PLAN, result);
+    Assert.assertEquals(FULL_REBUILD_PLAN, result);

Review Comment:
   We search for update/delete operations in the `COMPLETED_TXN_COMPONENTS` 
affected source tables at MV rebuild. Records are deleted from this table at 
compaction. So after compaction we can not confirm whether there were any 
deletes of any of the source tables any longer. It is relevant since executing 
an incremental rebuild plan which expects insert operations in all source table 
only in case there were deletes leads to data corruption in the refreshed view.
   
   The second rebuild can be an incremental since the first rebuild resets the 
source tables snapshot to a fresh one and txn data of operations done since 
that first rebuild still exists in `COMPLETED_TXN_COMPONENTS`.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 793591)
    Time Spent: 0.5h  (was: 20m)

> Invalid materialized view after rebuild if source table was compacted
> ---------------------------------------------------------------------
>
>                 Key: HIVE-26375
>                 URL: https://issues.apache.org/jira/browse/HIVE-26375
>             Project: Hive
>          Issue Type: Bug
>          Components: Materialized views, Transactions
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After HIVE-25656 MV state depends on the number of rows deleted/updated in 
> the source tables of the view. However if one of the source tables are major 
> compacted the delete delta files are no longer available and reproducing the 
> rows should be deleted from the MV is no longer possible.
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mv1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as select a,b,c from t1 where a > 0 or a is null;
> update t1 set b = 'Changed' where a = 1;
> alter table t1 compact 'major';
> alter materialized view t1 rebuild;
> select * from mv1;
> {code}
> Select should result 
> {code}
>       "1\tChanged\t1.1",
>       "2\ttwo\t2.2",
>       "NULL\tNULL\tNULL"
> {code}
> but was
> {code}
>       "1\tone\t1.1",      
>       "2\ttwo\t2.2",
>       "NULL\tNULL\tNULL",
>       "1\tChanged\t1.1"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to