[ 
https://issues.apache.org/jira/browse/HIVE-28975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005078#comment-18005078
 ] 

Denys Kuzmenko commented on HIVE-28975:
---------------------------------------

Merged to master

[~harshalk], thanks  for the patch!

> [HiveAcidReplication] Remove dangling txns from Target side post incremental 
> replication
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-28975
>                 URL: https://issues.apache.org/jira/browse/HIVE-28975
>             Project: Hive
>          Issue Type: Improvement
>          Components: repl
>            Reporter: Harshal Patel
>            Assignee: Harshal Patel
>            Priority: Major
>              Labels: pull-request-available, replication
>
> *Context and Problem Statement:*
> Currently, due to certain inconsistencies on the Hive side, customers are 
> frequently encountering the repl_incompatible error, triggered by different 
> underlying issues.
>  * *Current Issue:* There are missing entries in the 
> txn_write_notification_log table for TRUNCATE operations. This causes 
> problems when the Hive configuration property hive.repl.filter.transactions 
> is set to true.
> To improve resiliency from the replication side, we propose a mechanism to 
> clean up dangling transaction entries on the Disaster Recovery (DR) cluster 
> after the incremental load completes.
> *Proposed Solution:*
> We introduce a mechanism to capture and reconcile the state of open 
> transactions during the replication process.
> h3. *Steps:*
>  # *Capture Initial Open Transactions:*
>  * At the beginning of the incremental dump, capture the list of open 
> transactions.
>  * For example, this initial list might be: 1, 2, 3.
>  # *Proceed with Normal Dump Process:*
>  * While the dump is in progress, some transactions may complete, and new 
> ones may start.
>  * For instance, suppose transaction 1 completes and transaction 4 starts.
>  # *Capture Final Open Transactions:*
>  * After the dump completes, capture the list of open transactions again.
>  * This list might now be: 2, 3, 4.
>  * Append the new transaction (4 in this case) to the list and persist it in 
> a file.
>  # *During Load on the DR Cluster:*
>  * Here load will have 1,2,3,4 as open transactions from source
>  * After the load process completes, retrieve the transaction list from the 
> repl_txn_map for the respective database.
>  # *Clean Dangling Transactions:*
>  * Abort the transactions on the DR cluster that are *not* present in the 
> final list of transactions captured in step 3.
>  * It will be like remove from repl_txn_map where not in (list of open txn 
> from source)
> h3. *Rationale Behind Key Steps:*
> *Why is Step 1 Important?*
> If the initial list of open transactions is not captured, the dump process 
> might begin with a set of transactions assumed to be in a consistent state. 
> For example, if transaction 1 was open at the time the dump started, it will 
> remain open on the DR cluster after replication. But it got closed during 
> dump was running. So, skipping this step would result in incorrect abortion 
> of valid transactions during cleanup (step 5).
> *Why is Step 3 Important?*
> If a transaction (e.g., transaction 4) is opened between steps 1 and 2 and is 
> replicated as part of the dump, it must be included in the list. Otherwise, 
> it would be incorrectly aborted during the cleanup phase (step 5).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to