[
https://issues.apache.org/jira/browse/GOBBLIN-1961?focusedWorklogId=891669&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-891669
]
ASF GitHub Bot logged work on GOBBLIN-1961:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 22/Nov/23 00:33
Start Date: 22/Nov/23 00:33
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3835:
URL: https://github.com/apache/gobblin/pull/3835#discussion_r1401365150
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -63,10 +63,8 @@
@Slf4j
@Getter
public class IcebergDataset implements PrioritizedCopyableDataset {
- private final String dbName;
- private final String inputTableName;
private final IcebergTable srcIcebergTable;
- /** Presumed destination {@link IcebergTable} exists */
+ /* CAUTION: *hopefully* `destIcebergTable` exists... although that's not
necessarily been verified yet */
Review Comment:
Might be out of scope for this PR, but shouldn't our default behavior not
have that assumption and create the table if missing? That's the contract for
Hive distcp
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -117,17 +113,17 @@ public Iterator<FileSet<CopyEntity>>
getFileSetIterator(FileSystem targetFs, Cop
return createFileSets(targetFs, configuration);
}
- /** @return unique ID for this dataset, usable as a {@link
CopyEntity}.fileset, for atomic publication grouping */
+ /** @return unique ID for dataset (based on the source-side table), usable
as a {@link CopyEntity#getFileSet}, for atomic publication grouping */
protected String getFileSetId() {
Review Comment:
Nit: I feel like this is better encapsulated as getSourceTableId because
fileSetId() makes me think about some subset of the table, not just the src
table.
Issue Time Tracking
-------------------
Worklog Id: (was: 891669)
Time Spent: 1h (was: 50m)
> Qualify IcebergTable DatasetDescriptors (used by Iceberg-Distcp)
> ----------------------------------------------------------------
>
> Key: GOBBLIN-1961
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1961
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-compliance
> Reporter: Kip Kohn
> Assignee: Issac Buenrostro
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> `IcebergTable.getDatasetDescriptor` currently uses only the table name,
> although it should be qualified by the DB (source or destination,
> respectively)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)