[ 
https://issues.apache.org/jira/browse/GOBBLIN-1961?focusedWorklogId=891669&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-891669
 ]

ASF GitHub Bot logged work on GOBBLIN-1961:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Nov/23 00:33
            Start Date: 22/Nov/23 00:33
    Worklog Time Spent: 10m 
      Work Description: Will-Lo commented on code in PR #3835:
URL: https://github.com/apache/gobblin/pull/3835#discussion_r1401365150


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -63,10 +63,8 @@
 @Slf4j
 @Getter
 public class IcebergDataset implements PrioritizedCopyableDataset {
-  private final String dbName;
-  private final String inputTableName;
   private final IcebergTable srcIcebergTable;
-  /** Presumed destination {@link IcebergTable} exists */
+  /* CAUTION: *hopefully* `destIcebergTable` exists... although that's not 
necessarily been verified yet */

Review Comment:
   Might be out of scope for this PR, but shouldn't our default behavior not 
have that assumption and create the table if missing? That's the contract for 
Hive distcp



##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java:
##########
@@ -117,17 +113,17 @@ public Iterator<FileSet<CopyEntity>> 
getFileSetIterator(FileSystem targetFs, Cop
     return createFileSets(targetFs, configuration);
   }
 
-  /** @return unique ID for this dataset, usable as a {@link 
CopyEntity}.fileset, for atomic publication grouping */
+  /** @return unique ID for dataset (based on the source-side table), usable 
as a {@link CopyEntity#getFileSet}, for atomic publication grouping */
   protected String getFileSetId() {

Review Comment:
   Nit: I feel like this is better encapsulated as getSourceTableId because 
fileSetId() makes me think about some subset of the table, not just the src 
table.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 891669)
    Time Spent: 1h  (was: 50m)

> Qualify IcebergTable DatasetDescriptors (used by Iceberg-Distcp)
> ----------------------------------------------------------------
>
>                 Key: GOBBLIN-1961
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1961
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-compliance
>            Reporter: Kip Kohn
>            Assignee: Issac Buenrostro
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> `IcebergTable.getDatasetDescriptor` currently uses only the table name, 
> although it should be qualified by the DB (source or destination, 
> respectively)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to