ZihanLi58 commented on code in PR #3571:
URL: https://github.com/apache/gobblin/pull/3571#discussion_r984018914


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -135,15 +124,27 @@ protected Iterator<FileSet<CopyEntity>> 
createFileSets(FileSystem targetFs, Copy
    * table replication.
    */
   @VisibleForTesting
-  Collection<CopyEntity> generateCopyEntities(FileSystem targetFs, 
CopyConfiguration configuration) throws IOException {
+  Collection<CopyEntity> generateCopyEntities(FileSystem targetFs, 
CopyConfiguration copyConfig) throws IOException {
     String fileSet = this.getFileSetId();
     List<CopyEntity> copyEntities = Lists.newArrayList();
     Map<Path, FileStatus> pathToFileStatus = getFilePathsToFileStatus();
     log.info("{}.{} - found {} candidate source paths", dbName, 
inputTableName, pathToFileStatus.size());
 
-    for (CopyableFile.Builder builder : 
getCopyableFilesFromPaths(pathToFileStatus, configuration, targetFs)) {
-      CopyableFile fileEntity =
-          
builder.fileSet(fileSet).datasetOutputPath(targetFs.getUri().getPath()).build();
+    Configuration defaultHadoopConfiguration = new Configuration();
+    for (Map.Entry<Path, FileStatus> entry : pathToFileStatus.entrySet()) {
+      Path srcPath = entry.getKey();
+      FileStatus srcFileStatus = entry.getValue();
+      // TODO: determine whether unnecessarily expensive to repeatedly 
re-create what should be the same FS: could it
+      // instead be created once and reused thereafter?
+      FileSystem actualSourceFs = 
getSourceFileSystemFromFileStatus(srcFileStatus, defaultHadoopConfiguration);

Review Comment:
   If we have assumption that all files for source are on same fs, we should be 
safe to do so



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to