Will-Lo commented on code in PR #3835:
URL: https://github.com/apache/gobblin/pull/3835#discussion_r1401365150


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -63,10 +63,8 @@
 @Slf4j
 @Getter
 public class IcebergDataset implements PrioritizedCopyableDataset {
-  private final String dbName;
-  private final String inputTableName;
   private final IcebergTable srcIcebergTable;
-  /** Presumed destination {@link IcebergTable} exists */
+  /* CAUTION: *hopefully* `destIcebergTable` exists... although that's not 
necessarily been verified yet */

Review Comment:
   Might be out of scope for this PR, but shouldn't our default behavior not 
have that assumption and create the table if missing? That's the contract for 
Hive distcp



##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -117,17 +113,17 @@ public Iterator<FileSet<CopyEntity>> 
getFileSetIterator(FileSystem targetFs, Cop
     return createFileSets(targetFs, configuration);
   }
 
-  /** @return unique ID for this dataset, usable as a {@link 
CopyEntity}.fileset, for atomic publication grouping */
+  /** @return unique ID for dataset (based on the source-side table), usable 
as a {@link CopyEntity#getFileSet}, for atomic publication grouping */
   protected String getFileSetId() {

Review Comment:
   Nit: I feel like this is better encapsulated as getSourceTableId because 
fileSetId() makes me think about some subset of the table, not just the src 
table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to