[ 
https://issues.apache.org/jira/browse/HIVE-24840?focusedWorklogId=568163&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568163
 ]

ASF GitHub Bot logged work on HIVE-24840:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Mar/21 08:04
            Start Date: 18/Mar/21 08:04
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #2088:
URL: https://github.com/apache/hive/pull/2088


   ### What changes were proposed in this pull request?
   When checking a Materialized view validity check whether any of the source 
tables compacted since the last materialized view rebuild.
   
   ### Why are the changes needed?
   During Materialized view rebuild we choose from incremental or full rebuild. 
   To make this choice existing implementation searches for delete transactions 
affect the source tables of the MV in `COMPLETED_TXN_COMPONENTS` table 
(Metastore) since the last rebuild. However these records are deleted during 
compaction. This leads to corrupted materialized view datasets since 
incremental rebuild will be used which does not handle deleted records.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Query the materialized view and queries which plan is rewritten to scan 
the materialized view may produce different results.
   Only transactional materialized views are affected.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_create_rewrite_4.q 
-pl itests/qtest -Pitests
   mvn test -Dtest=TestMaterializedViewRebuild -pl itests/hive-unit -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 568163)
    Remaining Estimate: 0h
            Time Spent: 10m

> Materialized View incremental rebuild produces wrong result set after 
> compaction
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-24840
>                 URL: https://issues.apache.org/jira/browse/HIVE-24840
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Critical
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as 
>             select a,b,c from t1 where a > 0 or a is null;
> delete from t1 where a = 1;
> alter table t1 compact 'major';
> -- Wait until compaction finished.
> alter materialized view mat1 rebuild;
> {code}
> Expected result of query
> {code}
> select * from mat1;
> {code}
> {code}
> 2 two 2
> NULL NULL NULL
> {code}
> but if incremental rebuild is enabled the result is
> {code}
> 1 one 1
> 2 two 2
> NULL NULL NULL
> {code}
> Cause: Incremental rebuild queries whether the source tables of a 
> materialized view has delete or update transaction since the last rebuild 
> from metastore from COMPLETED_TXN_COMPONENTS table. However when a major 
> compaction is performed on the source tables the records related to these 
> tables are deleted from COMPLETED_TXN_COMPONENTS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to