[
https://issues.apache.org/jira/browse/HIVE-28975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denys Kuzmenko reopened HIVE-28975:
-----------------------------------
> [HiveAcidReplication] Remove dangling txns from Target side post incremental
> replication
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-28975
> URL: https://issues.apache.org/jira/browse/HIVE-28975
> Project: Hive
> Issue Type: Improvement
> Components: repl
> Reporter: Harshal Patel
> Assignee: Harshal Patel
> Priority: Major
> Labels: pull-request-available, replication
>
> *Context and Problem Statement:*
> Currently, due to certain inconsistencies on the Hive side, customers are
> frequently encountering the repl_incompatible error, triggered by different
> underlying issues.
> * *Current Issue:* There are missing entries in the
> txn_write_notification_log table for TRUNCATE operations. This causes
> problems when the Hive configuration property hive.repl.filter.transactions
> is set to true.
> To improve resiliency from the replication side, we propose a mechanism to
> clean up dangling transaction entries on the Disaster Recovery (DR) cluster
> after the incremental load completes.
> *Proposed Solution:*
> We introduce a mechanism to capture and reconcile the state of open
> transactions during the replication process.
> h3. *Steps:*
> # *Capture Initial Open Transactions:*
> * At the beginning of the incremental dump, capture the list of open
> transactions.
> * For example, this initial list might be: 1, 2, 3.
> # *Proceed with Normal Dump Process:*
> * While the dump is in progress, some transactions may complete, and new
> ones may start.
> * For instance, suppose transaction 1 completes and transaction 4 starts.
> # *Capture Final Open Transactions:*
> * After the dump completes, capture the list of open transactions again.
> * This list might now be: 2, 3, 4.
> * Append the new transaction (4 in this case) to the list and persist it in
> a file.
> # *During Load on the DR Cluster:*
> * Here load will have 1,2,3,4 as open transactions from source
> * After the load process completes, retrieve the transaction list from the
> repl_txn_map for the respective database.
> # *Clean Dangling Transactions:*
> * Abort the transactions on the DR cluster that are *not* present in the
> final list of transactions captured in step 3.
> * It will be like remove from repl_txn_map where not in (list of open txn
> from source)
> h3. *Rationale Behind Key Steps:*
> *Why is Step 1 Important?*
> If the initial list of open transactions is not captured, the dump process
> might begin with a set of transactions assumed to be in a consistent state.
> For example, if transaction 1 was open at the time the dump started, it will
> remain open on the DR cluster after replication. But it got closed during
> dump was running. So, skipping this step would result in incorrect abortion
> of valid transactions during cleanup (step 5).
> *Why is Step 3 Important?*
> If a transaction (e.g., transaction 4) is opened between steps 1 and 2 and is
> replicated as part of the dump, it must be included in the list. Otherwise,
> it would be incorrectly aborted during the cleanup phase (step 5).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)