[
https://issues.apache.org/jira/browse/GOBBLIN-1786?focusedWorklogId=847072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847072
]
ASF GitHub Bot logged work on GOBBLIN-1786:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 23/Feb/23 00:27
Start Date: 23/Feb/23 00:27
Worklog Time Spent: 10m
Work Description: meethngala commented on code in PR #3643:
URL: https://github.com/apache/gobblin/pull/3643#discussion_r1115135216
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergCatalog.java:
##########
@@ -17,10 +17,25 @@
package org.apache.gobblin.data.management.copy.iceberg;
+import org.apache.iceberg.catalog.Catalog;
+
/**
* Any catalog from which to access {@link IcebergTable}s.
*/
public interface IcebergCatalog {
IcebergTable openTable(String dbName, String tableName);
+ String getCatalogUri();
+
+ /**
+ * Adding a sub interface to help us provide an association between {@link
Catalog} and {@link IcebergCatalog}.
+ * This helps us resolve to the Catalog to its concrete implementation class
+ * Primarily needed to access `newTableOps` method which only certain {@link
Catalog} derived classes open for public access
+ */
+ interface CatalogSpecifier {
Review Comment:
Gobblin `IcebergCatalog` interface defines method that help us work with
`IcebergDataset` and `IcebergTable` which we use for performing distcp. In
future, we have use cases wherein the source and destination side catalogs
might be different and we'll need to define some methods to identify them and
perform operation(s) based on if its source or dest side catalog. Moreover,
creating a `IcebergCatalog` class taking in normal `Catalog` will still need us
to resolve to its concrete impl class... which is doable, but it will then lead
to going via an approach defining additional configurations for every iceberg
distcp job. It requires for us to get rid of `IcebergHiveCatalog` as well
right? In that case we might have to revisit how we have currently implemented
iceberg distcp and define the scope of each class/interface again which might
require another PR altogether.
Moreover, the method `newTableOps` being protected within the concrete
Catalog impl class eg. `HiveCatalog` requires us to have a raw instance of that
class in order to perform table operations. Hence, throwing an exception for a
method if its non-public won't solve our problem. Additionally, the approach
that I am weighing here is that we have additional properties defining
`CatalogType` instead of an additional sub-interface `CatalogSpecifier`
Issue Time Tracking
-------------------
Worklog Id: (was: 847072)
Time Spent: 3h 50m (was: 3h 40m)
> Support Other Catalog Types for Iceberg Distcp
> ----------------------------------------------
>
> Key: GOBBLIN-1786
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1786
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: Meeth Gala
> Priority: Major
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)