[
https://issues.apache.org/jira/browse/GOBBLIN-1973?focusedWorklogId=895296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-895296
]
ASF GitHub Bot logged work on GOBBLIN-1973:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 12/Dec/23 23:02
Start Date: 12/Dec/23 23:02
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #3845:
URL: https://github.com/apache/gobblin/pull/3845#discussion_r1424664958
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ManifestBasedDataset.java:
##########
@@ -176,10 +176,11 @@ public Iterator<FileSet<CopyEntity>>
getFileSetIterator(FileSystem targetFs, Cop
private static boolean shouldCopy(FileSystem targetFs, FileStatus
fileInSource, FileStatus fileInTarget, OwnerAndPermission replicatedPermission)
throws IOException {
- if (fileInSource.isDirectory() || fileInSource.getModificationTime() ==
fileInTarget.getModificationTime()) {
- // if source is dir or source and dst has same version, we compare the
permission to determine whether it needs another sync
+ if (fileInSource.isDirectory() || fileInSource.getModificationTime() <=
fileInTarget.getModificationTime()) {
+ // even if destination has a newer version than the source, we still
copy the file if the owner or permission is different
return !replicatedPermission.hasSameOwnerAndPermission(targetFs,
fileInTarget);
}
- return fileInSource.getModificationTime() >
fileInTarget.getModificationTime();
+ // Source is newer than the target, must copy
+ return true;
Review Comment:
would this be equivalent:
```
return fileInSrc.modT() > fileInTarg.modT() ||
!replicatedPermission.hasSameOwnerAndPerms(targFs, fileInTarget);
```
?
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ManifestBasedDataset.java:
##########
@@ -176,10 +176,11 @@ public Iterator<FileSet<CopyEntity>>
getFileSetIterator(FileSystem targetFs, Cop
private static boolean shouldCopy(FileSystem targetFs, FileStatus
fileInSource, FileStatus fileInTarget, OwnerAndPermission replicatedPermission)
throws IOException {
- if (fileInSource.isDirectory() || fileInSource.getModificationTime() ==
fileInTarget.getModificationTime()) {
- // if source is dir or source and dst has same version, we compare the
permission to determine whether it needs another sync
+ if (fileInSource.isDirectory() || fileInSource.getModificationTime() <=
fileInTarget.getModificationTime()) {
+ // even if destination has a newer version than the source, we still
copy the file if the owner or permission is different
return !replicatedPermission.hasSameOwnerAndPermission(targetFs,
fileInTarget);
}
- return fileInSource.getModificationTime() >
fileInTarget.getModificationTime();
+ // Source is newer than the target, must copy
+ return true;
Review Comment:
would this be equivalent:
```
return fileInSrc.modT() > fileInTarg.modT()
|| !replicatedPermission.hasSameOwnerAndPerms(targFs, fileInTarget);
```
?
Issue Time Tracking
-------------------
Worklog Id: (was: 895296)
Time Spent: 0.5h (was: 20m)
> Manifest file copy ignores copying files when source file permissions differ
> from dest
> --------------------------------------------------------------------------------------
>
> Key: GOBBLIN-1973
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1973
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-core
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> When Manifest distcp fails to set permissions for any reason, we would expect
> a retry of the copy to allow the datasets to reach eventual consistency.
> However, the current logic of checking if a file is eligible for copy is to
> check if the source file is newer, if so copy, and if the source file is
> exactly the same modification time as the destination file, also copy if the
> permissions are different. However, in most cases the destination will have a
> newer modification time once it is written and set permissions, so the source
> file will generally be older but we still want the permissions to match.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)