ChristinaTech commented on code in PR #6742:
URL: https://github.com/apache/iceberg/pull/6742#discussion_r1186773219


##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -111,7 +117,8 @@ public GlueCatalog() {}
 
   @Override
   public void initialize(String name, Map<String, String> properties) {
-    this.catalogProperties = ImmutableMap.copyOf(properties);
+    this.catalogProperties = new HashMap<>();
+    catalogProperties.putAll(properties);

Review Comment:
   Making CatalogProperties non-immutable in order to effectively side-load 
information into the AWS client factory post-initialization is a dangerous 
precedent to set. In addition, it doesn't seem to actually accomplish anything 
besides allowing the GlueCatalog to suddenly switch regions 
post-initialization, which is likely to introduce some dangerous side effects.



##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -437,6 +444,81 @@ public void renameTable(TableIdentifier from, 
TableIdentifier to) {
     LOG.info("Successfully renamed table from {} to {}", from, to);
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(
+      TableIdentifier identifier, String metadataFileLocation) {
+    Preconditions.checkArgument(
+        isValidIdentifier(identifier), "Table identifier to register is 
invalid: " + identifier);
+    Preconditions.checkArgument(
+        metadataFileLocation != null && !metadataFileLocation.isEmpty(),
+        "Cannot register an empty metadata file location as a table");
+
+    // keep the original behavior when force-register-table flag is off
+    if (!awsProperties.glueCatalogForceRegisterTable()) {
+      return super.registerTable(identifier, metadataFileLocation);
+    }
+
+    String factoryImpl =
+        PropertyUtil.propertyAsString(catalogProperties, 
AwsProperties.CLIENT_FACTORY, null);
+    if (factoryImpl != null && 
factoryImpl.equals(AssumeRoleAwsClientFactory.class.getName())) {

Review Comment:
   It hurts extensibility to make logic specific to a particular implementation 
class. If, for example, a customer needs to extend `AssumeRoleAwsClientFactory` 
to add some functionality for their own use, this logic will break as the class 
name will no longer match.



##########
aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java:
##########
@@ -187,6 +187,18 @@ public class AwsProperties implements Serializable {
    */
   public static final String GLUE_CATALOG_ENDPOINT = "glue.endpoint";
 
+  /**
+   * If set, Glue will always update the catalog table if the table already 
exists in glue catalog.
+   * By default, Glue catalog will only be able to create new table and will 
throw
+   * AlreadyExistsException when register an existing table name.
+   */
+  public static final String GLUE_CATALOG_FORCE_REGISTER_TABLE = 
"glue.force-register-table";
+
+  public static final boolean GLUE_CATALOG_FORCE_REGISTER_TABLE_DEFAULT = 
false;
+
+  /** Configure the Glue Catalog S3 FileIO Region to allow cross region s3 
access */
+  public static final String GLUE_CATALOG_FILE_IO_REGION = 
"glue.catalog-file-io-region";

Review Comment:
   So we already have `client.region` which sets the region for all services. I 
am going to be introducing a change in the near future that will allow setting 
per-region for the default AWS Client Factory (and we can extend it to the 
Assume Role client factory as well), so we probably want a more general 
per-service naming scheme. Based on how existing parameters are formatted where 
`glue.` and `s3.` are already established prefixes, most likely something like: 
`glue.region`, `s3.region`, `kms.region`, etc...



##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -437,6 +444,81 @@ public void renameTable(TableIdentifier from, 
TableIdentifier to) {
     LOG.info("Successfully renamed table from {} to {}", from, to);
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(
+      TableIdentifier identifier, String metadataFileLocation) {
+    Preconditions.checkArgument(
+        isValidIdentifier(identifier), "Table identifier to register is 
invalid: " + identifier);
+    Preconditions.checkArgument(
+        metadataFileLocation != null && !metadataFileLocation.isEmpty(),
+        "Cannot register an empty metadata file location as a table");
+
+    // keep the original behavior when force-register-table flag is off
+    if (!awsProperties.glueCatalogForceRegisterTable()) {
+      return super.registerTable(identifier, metadataFileLocation);
+    }
+
+    String factoryImpl =
+        PropertyUtil.propertyAsString(catalogProperties, 
AwsProperties.CLIENT_FACTORY, null);
+    if (factoryImpl != null && 
factoryImpl.equals(AssumeRoleAwsClientFactory.class.getName())) {
+      // overwrite client assume_role_region for file IO to make cross region 
call
+      String catalogFileIORegion = awsProperties.getGlueCatalogFileIORegion();
+      if (catalogFileIORegion != null) {
+        catalogProperties.put(AwsProperties.CLIENT_ASSUME_ROLE_REGION, 
catalogFileIORegion);
+      }
+    }
+
+    TableOperations ops = newTableOps(identifier);
+    InputFile metadataFile = ops.io().newInputFile(metadataFileLocation);
+    TableMetadata metadata = TableMetadataParser.read(ops.io(), metadataFile);
+
+    Map<String, String> tableParameters =
+        ImmutableMap.of(
+            BaseMetastoreTableOperations.TABLE_TYPE_PROP,
+            
BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE.toLowerCase(Locale.ENGLISH),
+            BaseMetastoreTableOperations.METADATA_LOCATION_PROP,
+            metadataFileLocation);
+
+    String databaseName =
+        IcebergToGlueConverter.getDatabaseName(
+            identifier, awsProperties.glueCatalogSkipNameValidation());
+    String tableName =
+        IcebergToGlueConverter.getTableName(
+            identifier, awsProperties.glueCatalogSkipNameValidation());
+
+    TableInput tableInput =
+        TableInput.builder()
+            .applyMutation(
+                builder ->
+                    IcebergToGlueConverter.setTableInputInformation(
+                        builder, metadata, tableParameters))
+            .name(tableName)
+            .tableType(GlueTableOperations.GLUE_EXTERNAL_TABLE_TYPE)
+            .parameters(tableParameters)
+            .build();
+
+    try {
+      glue.createTable(
+          
CreateTableRequest.builder().databaseName(databaseName).tableInput(tableInput).build());
+    } catch (software.amazon.awssdk.services.glue.model.AlreadyExistsException 
e) {
+      GetTableResponse response =
+          glue.getTable(
+              
GetTableRequest.builder().databaseName(databaseName).name(tableName).build());
+      String versionId = response.table().versionId();
+      glue.updateTable(
+          UpdateTableRequest.builder()
+              .databaseName(databaseName)
+              .tableInput(tableInput)
+              .versionId(versionId)
+              .build());
+    } catch (EntityNotFoundException e) {

Review Comment:
   So this logic is a combination of a fork of the logic in 
`BaseMetastoreCatalog.registerTable` and 
`GlueTableOperations.persistGlueTable`. As `GlueTableOperations` also has 
access to `AwsProperties`, I would recommend refactoring the logic in 
GlueTableOperations so that `persistGlueTable` can conditionally fall back to 
its update mode as that improves code reuse.
   
   If the concern is the creation of an extra metadata file, it looks like 
`GlueTableOperations` already checks whether it needs to during the 
`writeNewMetadataIfRequired` function and considering tableMetadata is always 
set for `registerTable`, it will never choose to write a new metadata file 
anyways.



##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -437,6 +444,81 @@ public void renameTable(TableIdentifier from, 
TableIdentifier to) {
     LOG.info("Successfully renamed table from {} to {}", from, to);
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(
+      TableIdentifier identifier, String metadataFileLocation) {
+    Preconditions.checkArgument(
+        isValidIdentifier(identifier), "Table identifier to register is 
invalid: " + identifier);
+    Preconditions.checkArgument(
+        metadataFileLocation != null && !metadataFileLocation.isEmpty(),
+        "Cannot register an empty metadata file location as a table");
+
+    // keep the original behavior when force-register-table flag is off
+    if (!awsProperties.glueCatalogForceRegisterTable()) {
+      return super.registerTable(identifier, metadataFileLocation);
+    }
+
+    String factoryImpl =
+        PropertyUtil.propertyAsString(catalogProperties, 
AwsProperties.CLIENT_FACTORY, null);
+    if (factoryImpl != null && 
factoryImpl.equals(AssumeRoleAwsClientFactory.class.getName())) {
+      // overwrite client assume_role_region for file IO to make cross region 
call
+      String catalogFileIORegion = awsProperties.getGlueCatalogFileIORegion();
+      if (catalogFileIORegion != null) {
+        catalogProperties.put(AwsProperties.CLIENT_ASSUME_ROLE_REGION, 
catalogFileIORegion);
+      }

Review Comment:
   Can't this logic and the associated parameter be removed and replaced with 
just setting `AwsProperties.CLIENT_ASSUME_ROLE_REGION` directly? This change is 
not in any way scoped to this call or AWS service, it effectively creates a 
case where what region the GlueCatalog is calling can suddenly change 
post-initialization after the first time `registerTable` is called.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to