Will-Lo commented on code in PR #3568:
URL: https://github.com/apache/gobblin/pull/3568#discussion_r978345925


##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDatasetTest.java:
##########
@@ -326,7 +326,7 @@ public TestRecursiveCopyableDataset(Path source, Path 
target, List<FileStatus> s
 
     @Override
     protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path, 
PathFilter fileFilter)
-        throws IOException {
+        throws RuntimeException {

Review Comment:
   It's not standard to declare that functions throw RuntimeException. See: 
http://www.javapractices.com/topic/TopicAction.do?Id=129
   It's not enforced by Java as this (and its descendants) are considered to be 
non-recoverable exceptions, so it is not necessary for callers to handle 
RuntimeException explicitly.



##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDataset.java:
##########
@@ -195,22 +196,28 @@ public Collection<? extends CopyEntity> 
getCopyableFiles(FileSystem targetFs, Co
 
     Map<Path, FileStatus> filesInSource =
         createPathMap(getFilesAtPath(this.fs, this.rootPath, this.pathFilter), 
this.rootPath);
-    Map<Path, FileStatus> filesInTarget =
-        createPathMap(getFilesAtPath(targetFs, targetPath, this.pathFilter), 
targetPath);
+
+    // Allow fileNotFoundException for filesInTarget since if it doesn't 
exist, they will be created.
+    List<FileStatus> filesAtPath = Lists.newArrayList();
+    try {
+      filesAtPath = getFilesAtPath(targetFs, targetPath, this.pathFilter);
+    } catch (FileNotFoundException e) {
+      log.info(String.format("Could not find any files on targetFs %s path 
%s.", targetFs.getUri(), targetPath));
+    }
+    Map<Path, FileStatus> filesInTarget = createPathMap(filesAtPath, 
targetPath);
 
     return getCopyableFilesImpl(configuration, filesInSource, filesInTarget, 
targetFs,
             nonGlobSearchPath, configuration.getPublishDir(), targetPath);
   }
 
   @VisibleForTesting
   protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path, 
PathFilter fileFilter)
-      throws IOException {
+      throws FileNotFoundException {
     try {
       return FileListUtils
           .listFilesToCopyAtPath(fs, path, fileFilter, 
applyFilterToDirectories, includeEmptyDirectories);
     } catch (IOException e) {
-      log.warn(String.format("Could not find any files on fs %s path %s due to 
the following exception. Returning an empty list of files.", fs.getUri(), 
path), e);
-      return Lists.newArrayList();
+      throw new FileNotFoundException(String.format("Could not find any files 
on fs %s path %s.", fs.getUri(), path));
     }

Review Comment:
   Sorry if I was misleading earlier, I thought about it some more and I think 
we need to be cautious here. We want to actually do the reverse of what you 
have. So we have the function catch (FileNotFoundException) here silently, 
which is the old behavior. We want to actually have this function return the 
empty list `filesAtPath` since otherwise it would cause all pipelines with one 
missing target folder to perform a full copy instead of an incremental copy.
   
   This means that there will be a tradeoff, the sourceFS will still fail 
silently if the folder is missing on the source. 



##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDatasetTest.java:
##########
@@ -326,7 +326,7 @@ public TestRecursiveCopyableDataset(Path source, Path 
target, List<FileStatus> s
 
     @Override
     protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path, 
PathFilter fileFilter)
-        throws IOException {
+        throws RuntimeException {

Review Comment:
   Though since this is a test function, so you can probably just have it throw 
the IOException instead or have it match the function definition but not throw



##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDataset.java:
##########
@@ -195,22 +196,28 @@ public Collection<? extends CopyEntity> 
getCopyableFiles(FileSystem targetFs, Co
 
     Map<Path, FileStatus> filesInSource =
         createPathMap(getFilesAtPath(this.fs, this.rootPath, this.pathFilter), 
this.rootPath);
-    Map<Path, FileStatus> filesInTarget =
-        createPathMap(getFilesAtPath(targetFs, targetPath, this.pathFilter), 
targetPath);
+
+    // Allow fileNotFoundException for filesInTarget since if it doesn't 
exist, they will be created.
+    List<FileStatus> filesAtPath = Lists.newArrayList();
+    try {
+      filesAtPath = getFilesAtPath(targetFs, targetPath, this.pathFilter);
+    } catch (FileNotFoundException e) {
+      log.info(String.format("Could not find any files on targetFs %s path 
%s.", targetFs.getUri(), targetPath));
+    }
+    Map<Path, FileStatus> filesInTarget = createPathMap(filesAtPath, 
targetPath);
 
     return getCopyableFilesImpl(configuration, filesInSource, filesInTarget, 
targetFs,
             nonGlobSearchPath, configuration.getPublishDir(), targetPath);
   }
 
   @VisibleForTesting
   protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path, 
PathFilter fileFilter)
-      throws IOException {
+      throws FileNotFoundException {
     try {
       return FileListUtils
           .listFilesToCopyAtPath(fs, path, fileFilter, 
applyFilterToDirectories, includeEmptyDirectories);
     } catch (IOException e) {
-      log.warn(String.format("Could not find any files on fs %s path %s due to 
the following exception. Returning an empty list of files.", fs.getUri(), 
path), e);
-      return Lists.newArrayList();
+      throw new FileNotFoundException(String.format("Could not find any files 
on fs %s path %s.", fs.getUri(), path));
     }

Review Comment:
   Given that this is the current behavior for the other Gobblin pipelines (if 
source does not exist, do not fail but report no work done), let's just go with 
this for now. If we want to fail loudly on no workunits collected, we should 
handle it holistically at a higher level.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to