[ https://issues.apache.org/jira/browse/HIVE-28975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harshal Patel resolved HIVE-28975. ---------------------------------- Fix Version/s: 4.1.0 Resolution: Fixed > [HiveAcidReplication] Remove dangling txns from Target side post incremental > replication > ---------------------------------------------------------------------------------------- > > Key: HIVE-28975 > URL: https://issues.apache.org/jira/browse/HIVE-28975 > Project: Hive > Issue Type: Improvement > Components: repl > Reporter: Harshal Patel > Assignee: Harshal Patel > Priority: Major > Labels: pull-request-available, replication > Fix For: 4.1.0 > > > *Context and Problem Statement:* > Currently, due to certain inconsistencies on the Hive side, customers are > frequently encountering the repl_incompatible error, triggered by different > underlying issues. > * *Current Issue:* There are missing entries in the > txn_write_notification_log table for TRUNCATE operations. This causes > problems when the Hive configuration property hive.repl.filter.transactions > is set to true. > To improve resiliency from the replication side, we propose a mechanism to > clean up dangling transaction entries on the Disaster Recovery (DR) cluster > after the incremental load completes. > *Proposed Solution:* > We introduce a mechanism to capture and reconcile the state of open > transactions during the replication process. > h3. *Steps:* > # *Capture Initial Open Transactions:* > * At the beginning of the incremental dump, capture the list of open > transactions. > * For example, this initial list might be: 1, 2, 3. > # *Proceed with Normal Dump Process:* > * While the dump is in progress, some transactions may complete, and new > ones may start. > * For instance, suppose transaction 1 completes and transaction 4 starts. > # *Capture Final Open Transactions:* > * After the dump completes, capture the list of open transactions again. > * This list might now be: 2, 3, 4. > * Append the new transaction (4 in this case) to the list and persist it in > a file. > # *During Load on the DR Cluster:* > * Here load will have 1,2,3,4 as open transactions from source > * After the load process completes, retrieve the transaction list from the > repl_txn_map for the respective database. > # *Clean Dangling Transactions:* > * Abort the transactions on the DR cluster that are *not* present in the > final list of transactions captured in step 3. > * It will be like remove from repl_txn_map where not in (list of open txn > from source) > h3. *Rationale Behind Key Steps:* > *Why is Step 1 Important?* > If the initial list of open transactions is not captured, the dump process > might begin with a set of transactions assumed to be in a consistent state. > For example, if transaction 1 was open at the time the dump started, it will > remain open on the DR cluster after replication. But it got closed during > dump was running. So, skipping this step would result in incorrect abortion > of valid transactions during cleanup (step 5). > *Why is Step 3 Important?* > If a transaction (e.g., transaction 4) is opened between steps 1 and 2 and is > replicated as part of the dump, it must be included in the list. Otherwise, > it would be incorrectly aborted during the cleanup phase (step 5). > -- This message was sent by Atlassian Jira (v8.20.10#820010)