Will-Lo commented on code in PR #3568:
URL: https://github.com/apache/gobblin/pull/3568#discussion_r978345925
##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDatasetTest.java:
##########
@@ -326,7 +326,7 @@ public TestRecursiveCopyableDataset(Path source, Path
target, List<FileStatus> s
@Override
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter)
- throws IOException {
+ throws RuntimeException {
Review Comment:
It's not standard to declare that functions throw RuntimeException. See:
http://www.javapractices.com/topic/TopicAction.do?Id=129
It's not enforced by Java as this (and its descendants) are considered to be
non-recoverable exceptions, so it is not necessary for callers to handle
RuntimeException explicitly.
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDataset.java:
##########
@@ -195,22 +196,28 @@ public Collection<? extends CopyEntity>
getCopyableFiles(FileSystem targetFs, Co
Map<Path, FileStatus> filesInSource =
createPathMap(getFilesAtPath(this.fs, this.rootPath, this.pathFilter),
this.rootPath);
- Map<Path, FileStatus> filesInTarget =
- createPathMap(getFilesAtPath(targetFs, targetPath, this.pathFilter),
targetPath);
+
+ // Allow fileNotFoundException for filesInTarget since if it doesn't
exist, they will be created.
+ List<FileStatus> filesAtPath = Lists.newArrayList();
+ try {
+ filesAtPath = getFilesAtPath(targetFs, targetPath, this.pathFilter);
+ } catch (FileNotFoundException e) {
+ log.info(String.format("Could not find any files on targetFs %s path
%s.", targetFs.getUri(), targetPath));
+ }
+ Map<Path, FileStatus> filesInTarget = createPathMap(filesAtPath,
targetPath);
return getCopyableFilesImpl(configuration, filesInSource, filesInTarget,
targetFs,
nonGlobSearchPath, configuration.getPublishDir(), targetPath);
}
@VisibleForTesting
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter)
- throws IOException {
+ throws FileNotFoundException {
try {
return FileListUtils
.listFilesToCopyAtPath(fs, path, fileFilter,
applyFilterToDirectories, includeEmptyDirectories);
} catch (IOException e) {
- log.warn(String.format("Could not find any files on fs %s path %s due to
the following exception. Returning an empty list of files.", fs.getUri(),
path), e);
- return Lists.newArrayList();
+ throw new FileNotFoundException(String.format("Could not find any files
on fs %s path %s.", fs.getUri(), path));
}
Review Comment:
Sorry if I was misleading earlier, I thought about it some more and I think
we need to be cautious here. We want to actually do the reverse of what you
have. So we have the function catch (FileNotFoundException) here silently,
which is the old behavior. We want to actually have this function return the
empty list `filesAtPath` since otherwise it would cause all pipelines with one
missing target folder to perform a full copy instead of an incremental copy.
This means that there will be a tradeoff, the sourceFS will still fail
silently if the folder is missing on the source.
##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDatasetTest.java:
##########
@@ -326,7 +326,7 @@ public TestRecursiveCopyableDataset(Path source, Path
target, List<FileStatus> s
@Override
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter)
- throws IOException {
+ throws RuntimeException {
Review Comment:
Though since this is a test function, so you can probably just have it throw
the IOException instead or have it match the function definition but not throw
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDataset.java:
##########
@@ -195,22 +196,28 @@ public Collection<? extends CopyEntity>
getCopyableFiles(FileSystem targetFs, Co
Map<Path, FileStatus> filesInSource =
createPathMap(getFilesAtPath(this.fs, this.rootPath, this.pathFilter),
this.rootPath);
- Map<Path, FileStatus> filesInTarget =
- createPathMap(getFilesAtPath(targetFs, targetPath, this.pathFilter),
targetPath);
+
+ // Allow fileNotFoundException for filesInTarget since if it doesn't
exist, they will be created.
+ List<FileStatus> filesAtPath = Lists.newArrayList();
+ try {
+ filesAtPath = getFilesAtPath(targetFs, targetPath, this.pathFilter);
+ } catch (FileNotFoundException e) {
+ log.info(String.format("Could not find any files on targetFs %s path
%s.", targetFs.getUri(), targetPath));
+ }
+ Map<Path, FileStatus> filesInTarget = createPathMap(filesAtPath,
targetPath);
return getCopyableFilesImpl(configuration, filesInSource, filesInTarget,
targetFs,
nonGlobSearchPath, configuration.getPublishDir(), targetPath);
}
@VisibleForTesting
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter)
- throws IOException {
+ throws FileNotFoundException {
try {
return FileListUtils
.listFilesToCopyAtPath(fs, path, fileFilter,
applyFilterToDirectories, includeEmptyDirectories);
} catch (IOException e) {
- log.warn(String.format("Could not find any files on fs %s path %s due to
the following exception. Returning an empty list of files.", fs.getUri(),
path), e);
- return Lists.newArrayList();
+ throw new FileNotFoundException(String.format("Could not find any files
on fs %s path %s.", fs.getUri(), path));
}
Review Comment:
Given that this is the current behavior for the other Gobblin pipelines (if
source does not exist, do not fail but report no work done), let's just go with
this for now. If we want to fail loudly on no workunits collected, we should
handle it holistically at a higher level.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]