sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259235570
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##########
 @@ -271,12 +302,13 @@ public String getName() {
     LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
     if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
       ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-      if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+      if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
         rcwork.setDeleteDestIfExist(true);
         rcwork.setAutoPurge(isAutoPurge);
         rcwork.setNeedRecycle(needRecycle);
       }
       rcwork.setCopyToMigratedTxnTable(copyToMigratedTxnTable);
+      rcwork.setCheckDuplicateCopy(replicationSpec.needDupCopyCheck());
 
 Review comment:
   needDupCopyCheck seems to be a generic flag but we handle only for migration 
case. Instead, shall change it to isFirstIncAfterBootstrap which makes it clear 
that for some flow, need to handle this case differently.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to