[ 
https://issues.apache.org/jira/browse/HIVE-29459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-29459 started by Harshal Patel.
--------------------------------------------
> [DR][ACIDReplication] Add clearDanglingTxnTaskTask at the end
> -------------------------------------------------------------
>
>                 Key: HIVE-29459
>                 URL: https://issues.apache.org/jira/browse/HIVE-29459
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 4.2.0
>            Reporter: Harshal Patel
>            Assignee: Harshal Patel
>            Priority: Major
>              Labels: replication
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, at the end of replLoadTask, clearDanglingTxnTaskTask is added. 
> That works in normal scenario
>  
> {code:java}
> if 
> (conf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_CLEAR_DANGLING_TXNS_ON_TARGET)) 
> {      ClearDanglingTxnWork clearDanglingTxnWork = new 
> ClearDanglingTxnWork(work.getDumpDirectory(), targetDb.getName());
>       Task<ClearDanglingTxnWork> clearDanglingTxnTaskTask = 
> TaskFactory.get(clearDanglingTxnWork, conf);
>       if (childTasks.isEmpty()) {
>         childTasks.add(clearDanglingTxnTaskTask);
>       } else {
>         DAGTraversal.traverse(childTasks, new 
> AddDependencyToLeaves(Collections.singletonList(clearDanglingTxnTaskTask)));
>       }
>     }    return 0; {code}
>  
> [https://github.com/apache/hive/blob/38a963540000729f0ac8e8d2ac9cd1ca22930d2a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java#L966]
> But if the no of events for incremental load is > 
> {{hive.repl.approx.max.load.tasks then Load operation can break down the 
> tasks into batches of approx }}{{hive.repl.approx.max.load.tasks}}{{ (Not a 
> hard limit)}}
> {{In this case, it can lead to pre-maturely cleaning of repl_txn_map and 
> aborting the transaction in between the replication because 
> clearDanglingTxnTaskTask gets called in between the batches rather than 
> calling at the end only once per Load cycle. }}
> {{Fix:}}
> {{Add an additional check}}
> {{i.e }}
> {{}}
> {code:java}
> boolean hasPendingIncrementalWork = builder.hasMoreWork() || 
> work.hasBootstrapLoadTasks();
> if (conf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_CLEAR_DANGLING_TXNS_ON_TARGET)
>         && !hasPendingIncrementalWork) { {code}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to