[ 
https://issues.apache.org/jira/browse/GOBBLIN-1973?focusedWorklogId=895296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-895296
 ]

ASF GitHub Bot logged work on GOBBLIN-1973:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Dec/23 23:02
            Start Date: 12/Dec/23 23:02
    Worklog Time Spent: 10m 
      Work Description: phet commented on code in PR #3845:
URL: https://github.com/apache/gobblin/pull/3845#discussion_r1424664958


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ManifestBasedDataset.java:
##########
@@ -176,10 +176,11 @@ public Iterator<FileSet<CopyEntity>> 
getFileSetIterator(FileSystem targetFs, Cop
 
   private static boolean shouldCopy(FileSystem targetFs, FileStatus 
fileInSource, FileStatus fileInTarget, OwnerAndPermission replicatedPermission)
       throws IOException {
-    if (fileInSource.isDirectory() || fileInSource.getModificationTime() == 
fileInTarget.getModificationTime()) {
-      // if source is dir or source and dst has same version, we compare the 
permission to determine whether it needs another sync
+    if (fileInSource.isDirectory() || fileInSource.getModificationTime() <= 
fileInTarget.getModificationTime()) {
+      // even if destination has a newer version than the source, we still 
copy the file if the owner or permission is different
       return !replicatedPermission.hasSameOwnerAndPermission(targetFs, 
fileInTarget);
     }
-    return fileInSource.getModificationTime() > 
fileInTarget.getModificationTime();
+    // Source is newer than the target, must copy
+    return true;

Review Comment:
   would this be equivalent:
   ```
   return fileInSrc.modT() > fileInTarg.modT() || 
!replicatedPermission.hasSameOwnerAndPerms(targFs, fileInTarget);
   ```
   ?



##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ManifestBasedDataset.java:
##########
@@ -176,10 +176,11 @@ public Iterator<FileSet<CopyEntity>> 
getFileSetIterator(FileSystem targetFs, Cop
 
   private static boolean shouldCopy(FileSystem targetFs, FileStatus 
fileInSource, FileStatus fileInTarget, OwnerAndPermission replicatedPermission)
       throws IOException {
-    if (fileInSource.isDirectory() || fileInSource.getModificationTime() == 
fileInTarget.getModificationTime()) {
-      // if source is dir or source and dst has same version, we compare the 
permission to determine whether it needs another sync
+    if (fileInSource.isDirectory() || fileInSource.getModificationTime() <= 
fileInTarget.getModificationTime()) {
+      // even if destination has a newer version than the source, we still 
copy the file if the owner or permission is different
       return !replicatedPermission.hasSameOwnerAndPermission(targetFs, 
fileInTarget);
     }
-    return fileInSource.getModificationTime() > 
fileInTarget.getModificationTime();
+    // Source is newer than the target, must copy
+    return true;

Review Comment:
   would this be equivalent:
   ```
   return fileInSrc.modT() > fileInTarg.modT()
       || !replicatedPermission.hasSameOwnerAndPerms(targFs, fileInTarget);
   ```
   ?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 895296)
    Time Spent: 0.5h  (was: 20m)

> Manifest file copy ignores copying files when source file permissions differ 
> from dest
> --------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1973
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1973
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-core
>            Reporter: William Lo
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When Manifest distcp fails to set permissions for any reason, we would expect 
> a retry of the copy to allow the datasets to reach eventual consistency. 
> However, the current logic of checking if a file is eligible for copy is to 
> check if the source file is newer, if so copy, and if the source file is 
> exactly the same modification time as the destination file, also copy if the 
> permissions are different. However, in most cases the destination will have a 
> newer modification time once it is written and set permissions, so the source 
> file will generally be older but we still want the permissions to match.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to