danielcweeks commented on a change in pull request #1844:
URL: https://github.com/apache/iceberg/pull/1844#discussion_r544515999
##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java
##########
@@ -85,5 +101,6 @@ private S3Client client() {
@Override
public void initialize(Map<String, String> properties) {
Review comment:
As I mention above, in the `FileIO` comment, having both this path and
constructor path for initialization causes some problems. If we use the
default constructor, then `initialize()` gets called, don't we end up with two
S3 client being created (and the first never gets properly closed)?
##########
File path: aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java
##########
@@ -131,6 +170,8 @@
private String glueCatalogId;
private boolean glueCatalogSkipArchive;
+ private AwsClientFactory clientFactory;
Review comment:
I don't feel like the factory reference should be held by
`AwsProperties`. This also complicates the Serialization as AwsClientFactory
now needs to be Serializable and I'm not even sure that it is at this point due
to the `UrlConnectionHttpClient.create()` reference. It seems like we should
separate `AwsProperties` and the `AwsClientFactory`.
##########
File path: api/src/main/java/org/apache/iceberg/io/FileIO.java
##########
@@ -29,7 +29,7 @@
* must be serializable because various clients of Spark tables may initialize
this once and pass
* it off to a separate module that would then interact with the streams.
*/
-public interface FileIO extends Serializable {
+public interface FileIO extends Serializable, CatalogConfigurable {
Review comment:
I'm not sure it actually makes sense for the FileIO to implement
`CatalogConfigurable`. For S3FileIO, the initialization is just setting up the
`AwsProperties`, which is also being set as part of the constructor and those
two paths actually conflict (as you note in the comments). However, this makes
S3FileIO pretty confusing. (I'll add more comments on S3FileIO).
##########
File path:
aws/src/main/java/org/apache/iceberg/aws/AssumeRoleAwsClientFactory.java
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.aws;
+
+import java.util.Map;
+import java.util.UUID;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.PropertyUtil;
+import software.amazon.awssdk.awscore.client.builder.AwsClientBuilder;
+import software.amazon.awssdk.awscore.client.builder.AwsSyncClientBuilder;
+import software.amazon.awssdk.http.SdkHttpClient;
+import software.amazon.awssdk.http.urlconnection.UrlConnectionHttpClient;
+import software.amazon.awssdk.regions.Region;
+import software.amazon.awssdk.services.glue.GlueClient;
+import software.amazon.awssdk.services.kms.KmsClient;
+import software.amazon.awssdk.services.s3.S3Client;
+import software.amazon.awssdk.services.sts.StsClient;
+import
software.amazon.awssdk.services.sts.auth.StsAssumeRoleCredentialsProvider;
+import software.amazon.awssdk.services.sts.model.AssumeRoleRequest;
+
+/**
+ * Example of a {@link AwsClientFactory} for the assume role use case.
+ * <p>
+ * The factory is initialized with a role ARN, region and optional external ID
to assume from catalog properties,
+ * and configure all clients except the STS client to use the STS assume role
credentials provider.
+ * The STS client is initialized using default credential and region chain
+ * and used to refresh the assume role session token.
+ */
+public class AssumeRoleAwsClientFactory implements AwsClientFactory {
+
+ private static final SdkHttpClient HTTP_CLIENT_DEFAULT =
UrlConnectionHttpClient.create();
+
+ private String roleArn;
+ private String externalId;
+ private int timeout;
+ private Region region;
+
+ @Override
+ public S3Client s3() {
+ return S3Client.builder().applyMutation(this::configure).build();
+ }
+
+ @Override
+ public GlueClient glue() {
+ return GlueClient.builder().applyMutation(this::configure).build();
+ }
+
+ @Override
+ public KmsClient kms() {
+ return KmsClient.builder().applyMutation(this::configure).build();
+ }
+
+ @Override
+ public void initialize(Map<String, String> properties) {
+ roleArn = properties.get(AwsProperties.CLIENT_ASSUME_ROLE_ARN);
+ Preconditions.checkNotNull(roleArn,
+ "Cannot initialize AssumeRoleClientConfigFactory with null role ARN");
+ timeout = PropertyUtil.propertyAsInt(properties,
AwsProperties.CLIENT_ASSUME_ROLE_TIMEOUT_SEC,
+ AwsProperties.CLIENT_ASSUME_ROLE_TIMEOUT_SEC_DEFAULT);
+ externalId = properties.get(AwsProperties.CLIENT_ASSUME_ROLE_EXTERNAL_ID);
+
+ String regionStr = properties.get(AwsProperties.CLIENT_ASSUME_ROLE_REGION);
+ Preconditions.checkNotNull(regionStr, "Cannot initialize
AssumeRoleClientConfigFactory with null region");
+ region = Region.of(regionStr);
+ }
+
+ private <T extends AwsClientBuilder & AwsSyncClientBuilder> T configure(T
clientBuilder) {
+
clientBuilder.credentialsProvider(StsAssumeRoleCredentialsProvider.builder()
+ .stsClient(StsClient.builder().httpClient(HTTP_CLIENT_DEFAULT).build())
Review comment:
Small issue I ran into when configuring StsClient was that `region()`
was required in some cases (It may have been due to cross-account role assume
in the same region, but I had to explicitly set it). If that's not a common
issue, we can ignore it for now.
##########
File path:
aws/src/main/java/org/apache/iceberg/aws/AssumeRoleAwsClientFactory.java
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.aws;
+
+import java.util.Map;
+import java.util.UUID;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.PropertyUtil;
+import software.amazon.awssdk.awscore.client.builder.AwsClientBuilder;
+import software.amazon.awssdk.awscore.client.builder.AwsSyncClientBuilder;
+import software.amazon.awssdk.http.SdkHttpClient;
+import software.amazon.awssdk.http.urlconnection.UrlConnectionHttpClient;
+import software.amazon.awssdk.regions.Region;
+import software.amazon.awssdk.services.glue.GlueClient;
+import software.amazon.awssdk.services.kms.KmsClient;
+import software.amazon.awssdk.services.s3.S3Client;
+import software.amazon.awssdk.services.sts.StsClient;
+import
software.amazon.awssdk.services.sts.auth.StsAssumeRoleCredentialsProvider;
+import software.amazon.awssdk.services.sts.model.AssumeRoleRequest;
+
+/**
+ * Example of a {@link AwsClientFactory} for the assume role use case.
+ * <p>
+ * The factory is initialized with a role ARN, region and optional external ID
to assume from catalog properties,
+ * and configure all clients except the STS client to use the STS assume role
credentials provider.
+ * The STS client is initialized using default credential and region chain
+ * and used to refresh the assume role session token.
+ */
+public class AssumeRoleAwsClientFactory implements AwsClientFactory {
+
+ private static final SdkHttpClient HTTP_CLIENT_DEFAULT =
UrlConnectionHttpClient.create();
+
+ private String roleArn;
+ private String externalId;
+ private int timeout;
+ private Region region;
+
+ @Override
+ public S3Client s3() {
+ return S3Client.builder().applyMutation(this::configure).build();
+ }
+
+ @Override
+ public GlueClient glue() {
+ return GlueClient.builder().applyMutation(this::configure).build();
+ }
+
+ @Override
+ public KmsClient kms() {
+ return KmsClient.builder().applyMutation(this::configure).build();
+ }
+
+ @Override
+ public void initialize(Map<String, String> properties) {
+ roleArn = properties.get(AwsProperties.CLIENT_ASSUME_ROLE_ARN);
+ Preconditions.checkNotNull(roleArn,
+ "Cannot initialize AssumeRoleClientConfigFactory with null role ARN");
+ timeout = PropertyUtil.propertyAsInt(properties,
AwsProperties.CLIENT_ASSUME_ROLE_TIMEOUT_SEC,
+ AwsProperties.CLIENT_ASSUME_ROLE_TIMEOUT_SEC_DEFAULT);
+ externalId = properties.get(AwsProperties.CLIENT_ASSUME_ROLE_EXTERNAL_ID);
+
+ String regionStr = properties.get(AwsProperties.CLIENT_ASSUME_ROLE_REGION);
+ Preconditions.checkNotNull(regionStr, "Cannot initialize
AssumeRoleClientConfigFactory with null region");
+ region = Region.of(regionStr);
+ }
+
+ private <T extends AwsClientBuilder & AwsSyncClientBuilder> T configure(T
clientBuilder) {
+
clientBuilder.credentialsProvider(StsAssumeRoleCredentialsProvider.builder()
Review comment:
As we did in a few other places (like S3OutputStream) it would be good
to break out the builders a little as the nested builders tend to be a little
harder to follow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]