[
https://issues.apache.org/jira/browse/GOBBLIN-1709?focusedWorklogId=809364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-809364
]
ASF GitHub Bot logged work on GOBBLIN-1709:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 16/Sep/22 01:17
Start Date: 16/Sep/22 01:17
Worklog Time Spent: 10m
Work Description: meethngala commented on code in PR #3560:
URL: https://github.com/apache/gobblin/pull/3560#discussion_r972531406
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetFinder.java:
##########
@@ -53,37 +54,39 @@ public class IcebergDatasetFinder implements
IterableDatasetFinder<IcebergDatase
public List<IcebergDataset> findDatasets() throws IOException {
List<IcebergDataset> matchingDatasets = new ArrayList<>();
/*
- * Both Iceberg database name and table name are mandatory,
- * since we are currently only supporting Hive Catalog based Iceberg
tables.
- * The design will support defaults and other catalogs in future releases.
+ * Both Iceberg database name and table name are mandatory based on
current implementation.
+ * Later we may explore supporting datasets similar to Hive
*/
- if (properties.getProperty(ICEBERG_DB_NAME) == null ||
properties.getProperty(ICEBERG_TABLE_NAME) == null) {
- throw new IOException("Iceberg database name or Iceberg table name is
missing");
+ if (StringUtils.isNotBlank(properties.getProperty(ICEBERG_DB_NAME)) ||
StringUtils.isNotBlank(properties.getProperty(ICEBERG_TABLE_NAME))) {
+ throw new IllegalArgumentException(String.format("Iceberg database name:
{%s} or Iceberg table name: {%s} is missing",
+ ICEBERG_DB_NAME, ICEBERG_TABLE_NAME));
}
this.dbName = properties.getProperty(ICEBERG_DB_NAME);
this.tblName = properties.getProperty(ICEBERG_TABLE_NAME);
Configuration configuration =
HadoopUtils.getConfFromProperties(properties);
IcebergCatalog icebergCatalog =
IcebergCatalogFactory.create(configuration);
- IcebergTable icebergTable = icebergCatalog.openTable(dbName, tblName);
- // Currently, we only support one dataset per iceberg table
- matchingDatasets.add(createIcebergDataset(dbName, tblName, icebergTable,
properties, fs));
+ /* Currently, we only support one dataset per iceberg table
+ * Error handling and verification of table existence will be included as
part IcebergTable.getCurrentSnapshotInfo() in future releases.
Review Comment:
added a TODO for next iteration
Issue Time Tracking
-------------------
Worklog Id: (was: 809364)
Time Spent: 5h 20m (was: 5h 10m)
> Create work units for Hive Catalog based Iceberg Datasets to support Distcp
> for Iceberg
> ---------------------------------------------------------------------------------------
>
> Key: GOBBLIN-1709
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1709
> Project: Apache Gobblin
> Issue Type: New Feature
> Components: distcp-ng
> Reporter: Meeth Gala
> Assignee: Issac Buenrostro
> Priority: Major
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> We want to support Distcp for Iceberg based datasets.
> As a pilot, we are starting with Hive Catalog and will expand the
> functionality to cover all Iceberg based datasets.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)