sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259770370
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##########
 @@ -61,6 +62,21 @@ public ReplCopyTask(){
     super();
   }
 
+  // If file is already present in base directory, then remove it from the 
list.
+  // Check  HIVE-21197 for more detail
+  private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, 
List<ReplChangeManager.FileInfo> srcFiles,
+                                           long writeId, int stmtId) throws 
IOException {
+    ListIterator<ReplChangeManager.FileInfo> iter = srcFiles.listIterator();
+    Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, 
writeId, writeId, stmtId));
+    while (iter.hasNext()) {
+      Path filePath = new Path(basePath, 
iter.next().getSourcePath().getName());
+      if (dstFs.exists(filePath)) {
 
 Review comment:
   OK

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to