sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259588369
########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java ########## @@ -271,12 +299,13 @@ public String getName() { LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath); if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){ ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false); - if (replicationSpec.isReplace() && conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) { + if (replicationSpec.isReplace() && (conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) { rcwork.setDeleteDestIfExist(true); Review comment: If it is insert overwrite case(writeId=2) and let's say few of the listed files are there in base_1. Now, this deleteDestIfExists marks to create base_2 instead of delta_2_2 as it is IOW. So, only partial data would be copied to base_2. After commit of writeId=2, all readers would see partial data from base_2. So, need to handle this case. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services