Will-Lo commented on code in PR #3835:
URL: https://github.com/apache/gobblin/pull/3835#discussion_r1401365150
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -63,10 +63,8 @@
@Slf4j
@Getter
public class IcebergDataset implements PrioritizedCopyableDataset {
- private final String dbName;
- private final String inputTableName;
private final IcebergTable srcIcebergTable;
- /** Presumed destination {@link IcebergTable} exists */
+ /* CAUTION: *hopefully* `destIcebergTable` exists... although that's not
necessarily been verified yet */
Review Comment:
Might be out of scope for this PR, but shouldn't our default behavior not
have that assumption and create the table if missing? That's the contract for
Hive distcp
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -117,17 +113,17 @@ public Iterator<FileSet<CopyEntity>>
getFileSetIterator(FileSystem targetFs, Cop
return createFileSets(targetFs, configuration);
}
- /** @return unique ID for this dataset, usable as a {@link
CopyEntity}.fileset, for atomic publication grouping */
+ /** @return unique ID for dataset (based on the source-side table), usable
as a {@link CopyEntity#getFileSet}, for atomic publication grouping */
protected String getFileSetId() {
Review Comment:
Nit: I feel like this is better encapsulated as getSourceTableId because
fileSetId() makes me think about some subset of the table, not just the src
table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]