Harshal Patel created HIVE-29459:
------------------------------------
Summary: [DR][ACIDReplication] Add clearDanglingTxnTaskTask at the
end
Key: HIVE-29459
URL: https://issues.apache.org/jira/browse/HIVE-29459
Project: Hive
Issue Type: Bug
Components: repl
Affects Versions: 4.2.0
Reporter: Harshal Patel
Assignee: Harshal Patel
Currently, at the end of replLoadTask, clearDanglingTxnTaskTask is added. That
works in normal scenario
{code:java}
if (conf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_CLEAR_DANGLING_TXNS_ON_TARGET))
{ ClearDanglingTxnWork clearDanglingTxnWork = new
ClearDanglingTxnWork(work.getDumpDirectory(), targetDb.getName());
Task<ClearDanglingTxnWork> clearDanglingTxnTaskTask =
TaskFactory.get(clearDanglingTxnWork, conf);
if (childTasks.isEmpty()) {
childTasks.add(clearDanglingTxnTaskTask);
} else {
DAGTraversal.traverse(childTasks, new
AddDependencyToLeaves(Collections.singletonList(clearDanglingTxnTaskTask)));
}
} return 0; {code}
[https://github.com/apache/hive/blob/38a963540000729f0ac8e8d2ac9cd1ca22930d2a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java#L966]
But if the no of events for incremental load is >
{{hive.repl.approx.max.load.tasks then Load operation can break down the tasks
into batches of approx }}{{hive.repl.approx.max.load.tasks}}{{ (Not a hard
limit)}}
{{In this case, it can lead to pre-maturely cleaning of repl_txn_map and
aborting the transaction in between the replication because
clearDanglingTxnTaskTask gets called in between the batches rather than calling
at the end only once per Load cycle. }}
{{Fix:}}
{{Add an additional check}}
{{i.e }}
{{}}
{code:java}
boolean hasPendingIncrementalWork = builder.hasMoreWork() ||
work.hasBootstrapLoadTasks();
if (conf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_CLEAR_DANGLING_TXNS_ON_TARGET)
&& !hasPendingIncrementalWork) { {code}
{{}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)