[
https://issues.apache.org/jira/browse/HADOOP-18565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701292#comment-17701292
]
ASF GitHub Bot commented on HADOOP-18565:
-----------------------------------------
steveloughran commented on code in PR #5421:
URL: https://github.com/apache/hadoop/pull/5421#discussion_r1138885312
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/NetworkBinding.java:
##########
@@ -23,9 +23,9 @@
import java.net.URI;
import java.net.URISyntaxException;
-import com.amazonaws.ClientConfiguration;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.http.apache.ApacheHttpClient;
Review Comment:
is this going to break when the hadoop-aws module is running against an
unshaded version of the SDK? as the reason this class does some reflection
games already is downgrade in this situation ...you can't switch to wildfly and
openssl, but you can at least work
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/SignerFactory.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.auth;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.auth.signer.Aws4Signer;
+import software.amazon.awssdk.auth.signer.Aws4UnsignedPayloadSigner;
+import software.amazon.awssdk.auth.signer.AwsS3V4Signer;
+import software.amazon.awssdk.core.signer.NoOpSigner;
+import software.amazon.awssdk.core.signer.Signer;
+
+import java.io.IOException;
Review Comment:
nit: make the java imports the topmost block of imports
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -834,58 +834,28 @@ protected static S3AStorageStatistics
createStorageStatistics(
* @throws UnknownStoreException the bucket is absent
* @throws IOException any other problem talking to S3
*/
- // TODO: Review: this used to call doesBucketExist in v1, which does not
check permissions,
- // not even read access.
@Retries.RetryTranslated
protected void verifyBucketExists() throws UnknownStoreException,
IOException {
if (!invoker.retry("doesBucketExist", bucket, true,
trackDurationOfOperation(getDurationTrackerFactory(),
STORE_EXISTS_PROBE.getSymbol(),
() -> {
try {
+ if (bucketRegions.containsKey(bucket)) {
+ return true;
+ }
s3Client.headBucket(HeadBucketRequest.builder().bucket(bucket).build());
return true;
- } catch (NoSuchBucketException e) {
- return false;
- }
- }))) {
- throw new UnknownStoreException("s3a://" + bucket + "/", " Bucket does "
+ "not exist");
- }
- }
-
- /**
- * Verify that the bucket exists. This will correctly throw an exception
- * when credentials are invalid.
- * TODO: Review. May be redundant in v2.
- * Retry policy: retrying, translated.
- * @throws UnknownStoreException the bucket is absent
- * @throws IOException any other problem talking to S3
- */
- @Retries.RetryTranslated
- protected void verifyBucketExistsV2()
- throws UnknownStoreException, IOException {
- if (!invoker.retry("doesBucketExistV2", bucket, true,
- trackDurationOfOperation(getDurationTrackerFactory(),
- STORE_EXISTS_PROBE.getSymbol(),
- () -> {
- // Bug in SDK always returns `true` for AccessPoint ARNs with
`doesBucketExistV2()`
- // expanding implementation to use ARNs and buckets correctly
- try {
- s3Client.getBucketAcl(GetBucketAclRequest.builder()
- .bucket(bucket)
- .build());
} catch (AwsServiceException ex) {
int statusCode = ex.statusCode();
if (statusCode == SC_404_NOT_FOUND ||
- (statusCode == SC_403_FORBIDDEN &&
- ex.getMessage().contains(AP_INACCESSIBLE))) {
+ (statusCode == SC_403_FORBIDDEN && accessPoint != null)) {
return false;
}
}
return true;
}))) {
- throw new UnknownStoreException("s3a://" + bucket + "/", " Bucket does "
- + "not exist");
+ throw new UnknownStoreException("s3a://" + bucket + "/", " Bucket does "
+ "not exist");
Review Comment:
thought. should the fs.s3a.endpoint value be included, as trying to talk to
an on-prem store without setting that value is not entirely uncommon?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -834,58 +834,28 @@ protected static S3AStorageStatistics
createStorageStatistics(
* @throws UnknownStoreException the bucket is absent
* @throws IOException any other problem talking to S3
*/
- // TODO: Review: this used to call doesBucketExist in v1, which does not
check permissions,
- // not even read access.
@Retries.RetryTranslated
protected void verifyBucketExists() throws UnknownStoreException,
IOException {
if (!invoker.retry("doesBucketExist", bucket, true,
trackDurationOfOperation(getDurationTrackerFactory(),
STORE_EXISTS_PROBE.getSymbol(),
Review Comment:
we don't audit this. it should be invoked via `trackDurationAndSpan()` so we
can track these calls in the s3 logs
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##########
@@ -136,7 +143,7 @@ public static NettyNioAsyncHttpClient.Builder
createAsyncHttpClientBuilder(Confi
httpClientBuilder.readTimeout(Duration.ofSeconds(socketTimeout));
httpClientBuilder.writeTimeout(Duration.ofSeconds(socketTimeout));
- // TODO: Need to set ssl socket factory, as done in
+ // TODO: Don't think you can set a socket factory for the netty client.
Review Comment:
oh well. why not create a follouwp jira which could be looked at in future.
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java:
##########
@@ -276,9 +277,28 @@ private Map<String, byte[]> retrieveHeaders(
final Statistic statistic) throws IOException {
StoreContext context = getStoreContext();
String objectKey = context.pathToKey(path);
- HeadObjectResponse md;
String symbol = statistic.getSymbol();
S3AStatisticsContext instrumentation = context.getInstrumentation();
+ Map<String, byte[]> headers = new TreeMap<>();
+ HeadObjectResponse md;
+
+ // Attempting to get metadata for the root, so use head bucket.
+ if (objectKey.isEmpty()) {
Review Comment:
never thought about this. is this done in a test? i know getFileStatus(/)
just makes up a status without talking to the store.
returning headers on a getXAttr call would actually be useful if it returned
interesting headers.
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/InconsistentS3ClientFactory.java:
##########
@@ -45,7 +46,7 @@ public class InconsistentS3ClientFactory extends
DefaultS3ClientFactory {
Review Comment:
i think we could just cut this class; it was there for generating
inconsistencies more reliably. mockito is better for managed fault injection.
proposed; do it as a followup
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java:
##########
@@ -276,9 +277,28 @@ private Map<String, byte[]> retrieveHeaders(
final Statistic statistic) throws IOException {
StoreContext context = getStoreContext();
String objectKey = context.pathToKey(path);
- HeadObjectResponse md;
String symbol = statistic.getSymbol();
S3AStatisticsContext instrumentation = context.getInstrumentation();
+ Map<String, byte[]> headers = new TreeMap<>();
+ HeadObjectResponse md;
+
+ // Attempting to get metadata for the root, so use head bucket.
+ if (objectKey.isEmpty()) {
+ HeadBucketResponse headBucketResponse =
+ trackDuration(instrumentation, symbol, () ->
callbacks.getBucketMetadata());
+
+ if (headBucketResponse.sdkHttpResponse() != null
+ && headBucketResponse.sdkHttpResponse().headers() != null
+ &&
headBucketResponse.sdkHttpResponse().headers().get("Content-Type") != null) {
Review Comment:
make a constant for this; the class did use
com.amazonaws.services.s3.Headers for it in the past. I don't see any standard
set of headers declared in hadoop-common, so just do it in this file
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -807,10 +742,8 @@ private static AWSCredentialsProvider
createAWSV1CredentialProvider(
* @return the instantiated class
* @throws IOException on any instantiation failure.
*/
- private static AwsCredentialsProvider createAWSV2CredentialProvider(
- Configuration conf,
- Class<?> credClass,
- @Nullable URI uri) throws IOException {
+ private static AwsCredentialsProvider
createAWSV2CredentialProvider(Configuration conf,
Review Comment:
how about moving this credential stuff into its own class under s3a.auth?
its complicated enough to merit it, and pulling it out should make maintenance
easier as other changes to S3AUtils won't interfere
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFileystem.java:
##########
@@ -330,10 +330,9 @@ public void testDelegatedFileSystem() throws Throwable {
+ " if role restricted, permissions are tightened.");
S3AFileSystem fs = getFileSystem();
// force a probe of the remote FS to make sure its endpoint is valid
- // TODO: Previously a call to getObjectMetadata for a base path, ie with
an empty key would
- // return some metadata. (bucket region, content type). headObject()
fails without a key, check
- // how this can be fixed.
- // fs.getObjectMetadata(new Path("/"));
+ // TODO: Check what should happen here. Calling headObject() on the root
path fails in V2,
Review Comment:
let's not worry about it if it is too hard to do
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -725,18 +728,15 @@ private void doBucketProbing() throws IOException {
LOG.debug("skipping check for bucket existence");
break;
case 1:
- logDnsLookup(getConf());
- verifyBucketExists();
- break;
case 2:
logDnsLookup(getConf());
- verifyBucketExistsV2();
+ verifyBucketExists();
break;
default:
// we have no idea what this is, assume it is from a later release.
LOG.warn("Unknown bucket probe option {}: {}; falling back to check #2",
Review Comment:
1. text is out of date
2. how about we fall back to "not checking"? is is the current default after
all
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -987,12 +963,81 @@ private void bindAWSClient(URI name, boolean dtEnabled)
throws IOException {
.withRequesterPays(conf.getBoolean(ALLOW_REQUESTER_PAYS,
DEFAULT_ALLOW_REQUESTER_PAYS))
.withExecutionInterceptors(auditManager.createExecutionInterceptors())
.withMinimumPartSize(partSize)
- .withTransferManagerExecutor(unboundedThreadPool);
+ .withTransferManagerExecutor(unboundedThreadPool)
+ .withRegion(region);
S3ClientFactory clientFactory =
ReflectionUtils.newInstance(s3ClientFactoryClass, conf);
- s3Client = clientFactory.createS3ClientV2(getUri(), parameters);
+ s3Client = clientFactory.createS3Client(getUri(), parameters);
s3AsyncClient = clientFactory.createS3AsyncClient(getUri(), parameters);
- transferManager = clientFactory.createS3TransferManager(getUri(),
parameters);
+ transferManager = clientFactory.createS3TransferManager(s3AsyncClient);
+ }
+
+ /**
+ * Get the bucket region.
+ *
+ * @param region AWS S3 Region set in the config. This property may not be
set, in which case
+ * ask S3 for the region.
+ * @return region of the bucket.
+ */
+ private Region getS3Region(String region) throws IOException {
+
+ if (!StringUtils.isBlank(region)) {
+ return Region.of(region);
+ }
+
+ Region cachedRegion = bucketRegions.get(bucket);
+
+ if (cachedRegion != null) {
+ LOG.debug("Got region {} for bucket {} from cache", cachedRegion,
bucket);
+ return cachedRegion;
+ }
+
+ Region s3Region = trackDurationAndSpan(STORE_REGION_PROBE, bucket, null,
+ () -> invoker.retry("getS3Region", bucket, true, () -> {
+ try {
+
+ LOG.warn(
Review Comment:
use a LogOnce to keep the noise down
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -987,12 +963,81 @@ private void bindAWSClient(URI name, boolean dtEnabled)
throws IOException {
.withRequesterPays(conf.getBoolean(ALLOW_REQUESTER_PAYS,
DEFAULT_ALLOW_REQUESTER_PAYS))
.withExecutionInterceptors(auditManager.createExecutionInterceptors())
.withMinimumPartSize(partSize)
- .withTransferManagerExecutor(unboundedThreadPool);
+ .withTransferManagerExecutor(unboundedThreadPool)
+ .withRegion(region);
S3ClientFactory clientFactory =
ReflectionUtils.newInstance(s3ClientFactoryClass, conf);
- s3Client = clientFactory.createS3ClientV2(getUri(), parameters);
+ s3Client = clientFactory.createS3Client(getUri(), parameters);
s3AsyncClient = clientFactory.createS3AsyncClient(getUri(), parameters);
- transferManager = clientFactory.createS3TransferManager(getUri(),
parameters);
+ transferManager = clientFactory.createS3TransferManager(s3AsyncClient);
+ }
+
+ /**
+ * Get the bucket region.
+ *
+ * @param region AWS S3 Region set in the config. This property may not be
set, in which case
+ * ask S3 for the region.
+ * @return region of the bucket.
+ */
+ private Region getS3Region(String region) throws IOException {
Review Comment:
is an elegant way where the bucket region cache/lookup could be implemented
in a self contained class in the impl package -we are just always strugglying
to keep the size of S3AFileSystem class under control for manageability and
maintenance.
imagine some RegionResolver interface with the implementation given the
config and credentials and talking to s3. the mock classes could then just have
an implementation which made up a region. It would also allow for testing of
the caching mechanism,
this could be done as a followup to this PR here; extend
ExecutingStoreOperation with a new op
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -435,6 +436,8 @@ public class S3AFileSystem extends FileSystem implements
StreamCapabilities,
*/
private final Set<Path> deleteOnExit = new TreeSet<>();
+ private final Map<String, Region> bucketRegions = new HashMap<>();
Review Comment:
this would need to be static to work
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/SignerFactory.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.auth;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.auth.signer.Aws4Signer;
+import software.amazon.awssdk.auth.signer.Aws4UnsignedPayloadSigner;
+import software.amazon.awssdk.auth.signer.AwsS3V4Signer;
+import software.amazon.awssdk.core.signer.NoOpSigner;
+import software.amazon.awssdk.core.signer.Signer;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.fs.s3a.S3AUtils;
+
+
+/**
+ * Signer factory used to register and create signers.
+ */
+public final class SignerFactory {
Review Comment:
worried about thread safety here as the map isn't concurrent and there are
public methods to update it. ConcurrentHashMap would be safer
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/SignerFactory.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.auth;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.auth.signer.Aws4Signer;
+import software.amazon.awssdk.auth.signer.Aws4UnsignedPayloadSigner;
+import software.amazon.awssdk.auth.signer.AwsS3V4Signer;
+import software.amazon.awssdk.core.signer.NoOpSigner;
+import software.amazon.awssdk.core.signer.Signer;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.fs.s3a.S3AUtils;
+
+
+/**
+ * Signer factory used to register and create signers.
+ */
+public final class SignerFactory {
+
+ private static final Logger LOG =
LoggerFactory.getLogger(SignerFactory.class);
+ public static final String VERSION_FOUR_SIGNER = "AWS4SignerType";
+ public static final String VERSION_FOUR_UNSIGNED_PAYLOAD_SIGNER =
"AWS4UnsignedPayloadSignerType";
+ public static final String NO_OP_SIGNER = "NoOpSignerType";
+ private static final String S3_V4_SIGNER = "AWSS3V4SignerType";
+
+ private static final Map<String, Class<? extends Signer>> SIGNERS
+ = new HashMap<>();
+
+ static {
+ // Register the standard signer types.
+ SIGNERS.put(VERSION_FOUR_SIGNER, Aws4Signer.class);
+ SIGNERS.put(VERSION_FOUR_UNSIGNED_PAYLOAD_SIGNER,
Aws4UnsignedPayloadSigner.class);
+ SIGNERS.put(NO_OP_SIGNER, NoOpSigner.class);
+ SIGNERS.put(S3_V4_SIGNER, AwsS3V4Signer.class);
+ }
+
+
+ private SignerFactory() {
+ }
+
+ /**
+ * Register an implementation class for the given signer type.
+ *
+ * @param signerType The name of the signer type to register.
+ * @param signerClass The class implementing the given signature protocol.
+ */
+ public static void registerSigner(
+ final String signerType,
+ final Class<? extends Signer> signerClass) {
+
+ if (signerType == null) {
+ throw new IllegalArgumentException("signerType cannot be null");
+ }
+ if (signerClass == null) {
+ throw new IllegalArgumentException("signerClass cannot be null");
+ }
+
+ SIGNERS.put(signerType, signerClass);
+ }
+
+ /**
+ * Check if the signer has already been registered.
+ * @param signerType signer to get
+ */
+ public static void getSigner(String signerType) {
Review Comment:
prefer a title like "verify signer registered" as get* methods usually
return something
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ACannedACLs.java:
##########
@@ -57,8 +57,6 @@ protected Configuration createConfiguration() {
Configuration conf = super.createConfiguration();
removeBaseAndBucketOverrides(conf,
CANNED_ACL);
- // TODO: Check why we need this ACL? V2 does not have a LOG_DELIVERY_WRITE
ACL which causes
Review Comment:
just needed some acl to set to make sure we were dong it properly; look at
the jira HADOOP-17822
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AProxy.java:
##########
@@ -20,18 +20,17 @@
import java.io.IOException;
-import com.amazonaws.ClientConfiguration;
-import com.amazonaws.Protocol;
import org.assertj.core.api.Assertions;
import org.junit.Test;
+import software.amazon.awssdk.http.apache.ProxyConfiguration;
import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.s3a.impl.AWSClientConfig;
import org.apache.hadoop.test.AbstractHadoopTestBase;
import static org.apache.hadoop.fs.s3a.Constants.PROXY_HOST;
import static org.apache.hadoop.fs.s3a.Constants.PROXY_PORT;
import static org.apache.hadoop.fs.s3a.Constants.PROXY_SECURED;
-import static org.apache.hadoop.fs.s3a.S3AUtils.initProxySupport;
/**
Review Comment:
this is fixed now, right?
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -21,137 +21,138 @@
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
+import java.util.ArrayList;
+import java.util.List;
-import com.amazonaws.ClientConfiguration;
-import com.amazonaws.client.builder.AwsClientBuilder;
-import com.amazonaws.services.s3.AmazonS3;
-import com.amazonaws.util.AwsHostNameUtils;
import org.assertj.core.api.Assertions;
+import org.junit.Assert;
import org.junit.Test;
+import software.amazon.awssdk.awscore.AwsExecutionAttribute;
+import software.amazon.awssdk.awscore.exception.AwsServiceException;
+import software.amazon.awssdk.core.interceptor.Context;
+import software.amazon.awssdk.core.interceptor.ExecutionAttributes;
+import software.amazon.awssdk.core.interceptor.ExecutionInterceptor;
+import software.amazon.awssdk.services.s3.S3Client;
+import software.amazon.awssdk.services.s3.model.HeadBucketRequest;
+import software.amazon.awssdk.services.s3.model.S3Exception;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.s3a.statistics.impl.EmptyS3AStatisticsContext;
import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
-import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_CENTRAL_REGION;
-import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_ENDPOINT;
-import static
org.apache.hadoop.fs.s3a.impl.InternalConstants.AWS_REGION_SYSPROP;
-import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+import static org.apache.hadoop.fs.s3a.Constants.BUCKET_REGION_HEADER;
+import static org.apache.hadoop.fs.s3a.Statistic.STORE_REGION_PROBE;
+import static
org.apache.hadoop.fs.s3a.impl.InternalConstants.SC_301_MOVED_PERMANENTLY;
/**
* Test to check correctness of S3A endpoint regions in
* {@link DefaultS3ClientFactory}.
*/
public class ITestS3AEndpointRegion extends AbstractS3ATestBase {
- private static final String AWS_REGION_TEST = "test-region";
private static final String AWS_ENDPOINT_TEST = "test-endpoint";
- private static final String AWS_ENDPOINT_TEST_WITH_REGION =
- "test-endpoint.some-region.amazonaws.com";
- public static final String MARS_NORTH_2 = "mars-north-2";
- /**
- * Test to verify that setting a region with the config would bypass the
- * construction of region from endpoint.
- */
- @Test
- public void testWithRegionConfig() {
- getFileSystem().getConf().set(AWS_REGION, AWS_REGION_TEST);
-
- //Creating an endpoint config with a custom endpoint.
- AwsClientBuilder.EndpointConfiguration epr = createEpr(AWS_ENDPOINT_TEST,
- getFileSystem().getConf().getTrimmed(AWS_REGION));
- //Checking if setting region config bypasses the endpoint region.
- Assertions.assertThat(epr.getSigningRegion())
- .describedAs("There is a region mismatch")
- .isEqualTo(getFileSystem().getConf().get(AWS_REGION));
- }
/**
- * Test to verify that not setting the region config, would lead to using
- * endpoint to construct the region.
+ * Test to verify that not setting the region config, will lead to the
client factory making
+ * a HEAD bucket call to configure the correct region. If an incorrect
region is set, the HEAD
+ * bucket call in this test will raise an exception.
*/
@Test
- public void testWithoutRegionConfig() {
- getFileSystem().getConf().unset(AWS_REGION);
-
- //Creating an endpoint config with a custom endpoint containing a region.
- AwsClientBuilder.EndpointConfiguration eprRandom =
- createEpr(AWS_ENDPOINT_TEST_WITH_REGION,
- getFileSystem().getConf().getTrimmed(AWS_REGION));
- String regionFromEndpoint =
- AwsHostNameUtils
- .parseRegionFromAwsPartitionPattern(AWS_ENDPOINT_TEST_WITH_REGION);
- //Checking if not setting region config leads to constructing the region
- // from endpoint.
- Assertions.assertThat(eprRandom.getSigningRegion())
- .describedAs("There is a region mismatch")
- .isNotEqualTo(getFileSystem().getConf().get(AWS_REGION))
- .isEqualTo(regionFromEndpoint);
- }
+ public void testWithoutRegionConfig() throws IOException {
+ Configuration conf = getConfiguration();
+ String bucket = getFileSystem().getBucket();
+ conf.unset(String.format("fs.s3a.bucket.%s.endpoint.region", bucket));
+ conf.unset(AWS_REGION);
- /**
- * Method to create EndpointConfiguration using an endpoint.
- *
- * @param endpoint the endpoint to be used for EndpointConfiguration
creation.
- * @return an instance of EndpointConfiguration.
- */
- private AwsClientBuilder.EndpointConfiguration createEpr(String endpoint,
- String awsRegion) {
- return DefaultS3ClientFactory.createEndpointConfiguration(endpoint,
- new ClientConfiguration(), awsRegion);
+ S3AFileSystem fs = new S3AFileSystem();
+ fs.initialize(getFileSystem().getUri(), conf);
+
+ try {
+ fs.getBucketMetadata();
+ } catch (S3Exception exception) {
+ if (exception.statusCode() == SC_301_MOVED_PERMANENTLY) {
+ List<String> bucketRegion =
+
exception.awsErrorDetails().sdkHttpResponse().headers().get(BUCKET_REGION_HEADER);
+ Assert.fail("Region not configured correctly, expected region: " +
bucketRegion);
+ }
+ }
+
+
Assertions.assertThat(fs.getInstrumentation().getCounterValue(STORE_REGION_PROBE))
+ .describedAs("Region is not configured, region probe should have been
made").isEqualTo(1);
}
@Test
- public void testInvalidRegionDefaultEndpoint() throws Throwable {
- describe("Create a client with an invalid region and the default
endpoint");
+ public void testWithRegionConfig() throws IOException, URISyntaxException {
Configuration conf = getConfiguration();
- // we are making a big assumption about the timetable for AWS
- // region rollout.
- // if this test ever fails because this region now exists
- // -congratulations!
- conf.set(AWS_REGION, MARS_NORTH_2);
- createMarsNorth2Client(conf);
+ conf.set(AWS_REGION, "us-east-2");
+
+ S3AFileSystem fs = new S3AFileSystem();
+ fs.initialize(new URI("s3a://landsat-pds"), conf);
+
+
Assertions.assertThat(fs.getInstrumentation().getCounterValue(STORE_REGION_PROBE))
+ .describedAs("Region is configured, region probe should not have been
made").isEqualTo(0);
}
@Test
- public void testUnsetRegionDefaultEndpoint() throws Throwable {
- describe("Create a client with no region and the default endpoint");
+ public void testRegionCache() throws IOException, URISyntaxException {
Configuration conf = getConfiguration();
conf.unset(AWS_REGION);
- createS3Client(conf, DEFAULT_ENDPOINT, AWS_S3_CENTRAL_REGION);
+ conf.unset("fs.s3a.bucket.landsat-pds.endpoint.region");
+ S3AFileSystem fs = new S3AFileSystem();
+
+ fs.initialize(new URI("s3a://landsat-pds"), conf);
+
+
Assertions.assertThat(fs.getInstrumentation().getCounterValue(STORE_REGION_PROBE))
+ .describedAs("Incorrect number of calls made to get bucket
region").isEqualTo(1);
+
+ fs.initialize(new URI("s3a://landsat-pds"), conf);
+
+ // value should already be cached.
+
Assertions.assertThat(fs.getInstrumentation().getCounterValue(STORE_REGION_PROBE))
+ .describedAs("Incorrect number of calls made to get bucket
region").isEqualTo(0);
}
- /**
- * By setting the system property {@code "aws.region"} we can
- * guarantee that the SDK region resolution chain will always succeed
- * (and fast).
- * Clearly there is no validation of the region during the build process.
- */
@Test
- public void testBlankRegionTriggersSDKResolution() throws Throwable {
- describe("Create a client with a blank region and the default endpoint."
- + " This will trigger the SDK Resolution chain");
+ public void testEndpointOverride() throws Throwable {
+ describe("Create a client with no region and the default endpoint");
Configuration conf = getConfiguration();
- conf.set(AWS_REGION, "");
- System.setProperty(AWS_REGION_SYSPROP, MARS_NORTH_2);
+
+ S3Client client = createS3Client(conf, AWS_ENDPOINT_TEST);
+
try {
- createMarsNorth2Client(conf);
- } finally {
- System.clearProperty(AWS_REGION_SYSPROP);
+
client.headBucket(HeadBucketRequest.builder().bucket(getFileSystem().getBucket()).build());
+ } catch (AwsServiceException exception) {
+ // Expected to be thrown by interceptor, do nothing.
Review Comment:
if you use intercept() an exception is raised if the exception isn't
actually thrown. do you want that?
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -21,137 +21,138 @@
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
+import java.util.ArrayList;
+import java.util.List;
-import com.amazonaws.ClientConfiguration;
-import com.amazonaws.client.builder.AwsClientBuilder;
-import com.amazonaws.services.s3.AmazonS3;
-import com.amazonaws.util.AwsHostNameUtils;
import org.assertj.core.api.Assertions;
+import org.junit.Assert;
import org.junit.Test;
+import software.amazon.awssdk.awscore.AwsExecutionAttribute;
+import software.amazon.awssdk.awscore.exception.AwsServiceException;
+import software.amazon.awssdk.core.interceptor.Context;
+import software.amazon.awssdk.core.interceptor.ExecutionAttributes;
+import software.amazon.awssdk.core.interceptor.ExecutionInterceptor;
+import software.amazon.awssdk.services.s3.S3Client;
+import software.amazon.awssdk.services.s3.model.HeadBucketRequest;
+import software.amazon.awssdk.services.s3.model.S3Exception;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.s3a.statistics.impl.EmptyS3AStatisticsContext;
import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
-import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_CENTRAL_REGION;
-import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_ENDPOINT;
-import static
org.apache.hadoop.fs.s3a.impl.InternalConstants.AWS_REGION_SYSPROP;
-import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+import static org.apache.hadoop.fs.s3a.Constants.BUCKET_REGION_HEADER;
+import static org.apache.hadoop.fs.s3a.Statistic.STORE_REGION_PROBE;
+import static
org.apache.hadoop.fs.s3a.impl.InternalConstants.SC_301_MOVED_PERMANENTLY;
/**
* Test to check correctness of S3A endpoint regions in
* {@link DefaultS3ClientFactory}.
*/
public class ITestS3AEndpointRegion extends AbstractS3ATestBase {
- private static final String AWS_REGION_TEST = "test-region";
private static final String AWS_ENDPOINT_TEST = "test-endpoint";
- private static final String AWS_ENDPOINT_TEST_WITH_REGION =
- "test-endpoint.some-region.amazonaws.com";
- public static final String MARS_NORTH_2 = "mars-north-2";
- /**
- * Test to verify that setting a region with the config would bypass the
- * construction of region from endpoint.
- */
- @Test
- public void testWithRegionConfig() {
- getFileSystem().getConf().set(AWS_REGION, AWS_REGION_TEST);
-
- //Creating an endpoint config with a custom endpoint.
- AwsClientBuilder.EndpointConfiguration epr = createEpr(AWS_ENDPOINT_TEST,
- getFileSystem().getConf().getTrimmed(AWS_REGION));
- //Checking if setting region config bypasses the endpoint region.
- Assertions.assertThat(epr.getSigningRegion())
- .describedAs("There is a region mismatch")
- .isEqualTo(getFileSystem().getConf().get(AWS_REGION));
- }
/**
- * Test to verify that not setting the region config, would lead to using
- * endpoint to construct the region.
+ * Test to verify that not setting the region config, will lead to the
client factory making
+ * a HEAD bucket call to configure the correct region. If an incorrect
region is set, the HEAD
+ * bucket call in this test will raise an exception.
*/
@Test
- public void testWithoutRegionConfig() {
- getFileSystem().getConf().unset(AWS_REGION);
-
- //Creating an endpoint config with a custom endpoint containing a region.
- AwsClientBuilder.EndpointConfiguration eprRandom =
- createEpr(AWS_ENDPOINT_TEST_WITH_REGION,
- getFileSystem().getConf().getTrimmed(AWS_REGION));
- String regionFromEndpoint =
- AwsHostNameUtils
- .parseRegionFromAwsPartitionPattern(AWS_ENDPOINT_TEST_WITH_REGION);
- //Checking if not setting region config leads to constructing the region
- // from endpoint.
- Assertions.assertThat(eprRandom.getSigningRegion())
- .describedAs("There is a region mismatch")
- .isNotEqualTo(getFileSystem().getConf().get(AWS_REGION))
- .isEqualTo(regionFromEndpoint);
- }
+ public void testWithoutRegionConfig() throws IOException {
+ Configuration conf = getConfiguration();
+ String bucket = getFileSystem().getBucket();
+ conf.unset(String.format("fs.s3a.bucket.%s.endpoint.region", bucket));
+ conf.unset(AWS_REGION);
- /**
- * Method to create EndpointConfiguration using an endpoint.
- *
- * @param endpoint the endpoint to be used for EndpointConfiguration
creation.
- * @return an instance of EndpointConfiguration.
- */
- private AwsClientBuilder.EndpointConfiguration createEpr(String endpoint,
- String awsRegion) {
- return DefaultS3ClientFactory.createEndpointConfiguration(endpoint,
- new ClientConfiguration(), awsRegion);
+ S3AFileSystem fs = new S3AFileSystem();
+ fs.initialize(getFileSystem().getUri(), conf);
+
+ try {
+ fs.getBucketMetadata();
+ } catch (S3Exception exception) {
+ if (exception.statusCode() == SC_301_MOVED_PERMANENTLY) {
+ List<String> bucketRegion =
+
exception.awsErrorDetails().sdkHttpResponse().headers().get(BUCKET_REGION_HEADER);
+ Assert.fail("Region not configured correctly, expected region: " +
bucketRegion);
Review Comment:
what about including the full toString() of the exception
> AWS SDK V2 - Complete outstanding items
> ---------------------------------------
>
> Key: HADOOP-18565
> URL: https://issues.apache.org/jira/browse/HADOOP-18565
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Ahmar Suhail
> Priority: Major
> Labels: pull-request-available
>
> The following work remains to complete the SDK upgrade work:
> * S3A allows users configure to custom signers, add in support for this.
> * Remove SDK V1 bundle dependency
> * Update `getRegion()` logic to use retries.
> * Add in progress listeners for `S3ABlockOutputStream`
> * Fix any failing tests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]