sankarh commented on a change in pull request #541: HIVE-21197 : Hive
Replication can add duplicate data during migration to a target with
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259235570
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
##########
@@ -271,12 +302,13 @@ public String getName() {
LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
- if (replicationSpec.isReplace() &&
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+ if (replicationSpec.isReplace() &&
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
rcwork.setDeleteDestIfExist(true);
rcwork.setAutoPurge(isAutoPurge);
rcwork.setNeedRecycle(needRecycle);
}
rcwork.setCopyToMigratedTxnTable(copyToMigratedTxnTable);
+ rcwork.setCheckDuplicateCopy(replicationSpec.needDupCopyCheck());
Review comment:
needDupCopyCheck seems to be a generic flag but we handle only for migration
case. Instead, shall change it to isFirstIncAfterBootstrap which makes it clear
that for some flow, need to handle this case differently.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services