[
https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907221#comment-16907221
]
Peter Vary commented on HIVE-22111:
-----------------------------------
In the TxnHandler.commitTxn method when we store the new commit generated by a
replication event we do this:
{code:java}
s = "insert into COMPLETED_TXN_COMPONENTS (ctc_txnid, ctc_database, "
+
"ctc_table, ctc_partition, ctc_writeid, ctc_update_delete)
select tc_txnid," +
" tc_database, tc_table, tc_partition, tc_writeid, '" +
isUpdateDelete +
"' from TXN_COMPONENTS where tc_txnid = " + txnid +
//we only track compactor activity in TXN_COMPONENTS to handle
the case where the
//compactor txn aborts - so don't bother copying it to
COMPLETED_TXN_COMPONENTS
" AND tc_operation_type <> " +
quoteChar(OperationType.COMPACT.sqlConst);
{code}
See:
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L1227-L1233]
In case of replication {{isUpdateDelete}} is always 'N'.
{{TxnHandler.getMaterializationInvalidationInfo}} filters out components based
on {{ctc_update_delete}}.
{code:java}
query.append("select ctc_update_delete from COMPLETED_TXN_COMPONENTS
where ctc_update_delete='Y' AND (");
{code}
See:
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2021]
By my understanding this means that this will cause the Materialized View to
miss the change, and it will not be updated and might cause wrong results.
We do not have correct UpdateDelete information in case of replication, so the
quick fix would be that we set the isUpdateDelete to 'Y' every time when we are
coming from a replication event. If everything works as I expect then this
would mean that we might end up regenerating the Materialized View
unnecessarily on the target cluster, but we could ensure correct results even
in this edge case. [~jcamachorodriguez]: Would this be an acceptable tradeoff?
Thanks,
Peter
> Materialized view based on replicated table might not get refreshed
> -------------------------------------------------------------------
>
> Key: HIVE-22111
> URL: https://issues.apache.org/jira/browse/HIVE-22111
> Project: Hive
> Issue Type: Bug
> Components: Materialized views, repl
> Reporter: Peter Vary
> Assignee: Peter Vary
> Priority: Minor
>
> Consider the following scenario:
> * create a base table which we replicate
> * create a materialized view in the target hive based on the base table
> * modify (delete/update) the base table in the source hive
> * replicate the changes (delete/update) to the target hive
> * query the materialized view in the target hive
>
> We do not refresh the data, since when the transaction is created by
> replication we set ctc_update_delete to 'N'.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)