[ 
https://issues.apache.org/jira/browse/GOBBLIN-1786?focusedWorklogId=847072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847072
 ]

ASF GitHub Bot logged work on GOBBLIN-1786:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Feb/23 00:27
            Start Date: 23/Feb/23 00:27
    Worklog Time Spent: 10m 
      Work Description: meethngala commented on code in PR #3643:
URL: https://github.com/apache/gobblin/pull/3643#discussion_r1115135216


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergCatalog.java:
##########
@@ -17,10 +17,25 @@
 
 package org.apache.gobblin.data.management.copy.iceberg;
 
+import org.apache.iceberg.catalog.Catalog;
+
 
 /**
  * Any catalog from which to access {@link IcebergTable}s.
  */
 public interface IcebergCatalog {
   IcebergTable openTable(String dbName, String tableName);
+  String getCatalogUri();
+
+  /**
+   * Adding a sub interface to help us provide an association between {@link 
Catalog} and {@link IcebergCatalog}.
+   * This helps us resolve to the Catalog to its concrete implementation class
+   * Primarily needed to access `newTableOps` method which only certain {@link 
Catalog} derived classes open for public access
+   */
+  interface CatalogSpecifier {

Review Comment:
   Gobblin `IcebergCatalog` interface defines method that help us work with 
`IcebergDataset` and `IcebergTable` which we use for performing distcp. In 
future, we have use cases wherein the source and destination side catalogs 
might be different and we'll need to define some methods to identify them and 
perform operation(s) based on if its source or dest side catalog. Moreover, 
creating a `IcebergCatalog` class taking in normal `Catalog` will still need us 
to resolve to its concrete impl class... which is doable, but it will then lead 
to going via an approach defining additional configurations for every iceberg 
distcp job. It requires for us to get rid of `IcebergHiveCatalog` as well 
right? In that case we might have to revisit how we have currently implemented 
iceberg distcp and define the scope of each class/interface again which might 
require another PR altogether. 
   
   Moreover, the method `newTableOps` being protected within the concrete 
Catalog impl class eg. `HiveCatalog` requires us to have a raw instance of that 
class in order to perform table operations. Hence, throwing an exception for a 
method if its non-public won't solve our problem. Additionally, the approach 
that I am weighing here is that we have additional properties defining 
`CatalogType` instead of an additional sub-interface `CatalogSpecifier`





Issue Time Tracking
-------------------

    Worklog Id:     (was: 847072)
    Time Spent: 3h 50m  (was: 3h 40m)

> Support Other Catalog Types for Iceberg Distcp
> ----------------------------------------------
>
>                 Key: GOBBLIN-1786
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1786
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Meeth Gala
>            Priority: Major
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to