[
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812281#comment-17812281
]
ASF GitHub Bot commented on HADOOP-19044:
-----------------------------------------
ahmarsuhail commented on code in PR #6479:
URL: https://github.com/apache/hadoop/pull/6479#discussion_r1471123903
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -289,17 +290,36 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT,
ClientT>, ClientT> void
builder.fipsEnabled(fipsEnabled);
if (endpoint != null) {
+ boolean overrideEndpoint = true;
checkArgument(!fipsEnabled,
"%s : %s", ERROR_ENDPOINT_WITH_FIPS, endpoint);
- builder.endpointOverride(endpoint);
- // No region was configured, try to determine it from the endpoint.
- if (region == null) {
- region = getS3RegionFromEndpoint(parameters.getEndpoint());
+ boolean endpointEndsWithCentral =
+ endpointStr.endsWith(CENTRAL_ENDPOINT);
+ // No region was configured or the endpoint is central,
+ // determine the region from the endpoint.
+ if (region == null || endpointEndsWithCentral) {
Review Comment:
hmm, not sure about this. now we're parsing region if region is null or
endpoint = s3.amazonaws.com.
So if you set `s3.amazonaws.com` and region to eu-west-2, you still end up
with us setting the region to `us-east-2` and cross region enabled. My thinking
here is that a lot of people may have endpoint set to s3.amazonaws.com (as
atleast with SDK V1 it was harmless to do that I think) .
we only want to get into this parsing if region == null. so let's revert to
the previous condition here. And then we never don't want to override if the
endpoint is s3.amazonaws.com. Suggested:
```
if (endpoint != null) {
checkArgument(!fipsEnabled,
"%s : %s", ERROR_ENDPOINT_WITH_FIPS, endpoint);
boolean endpointEndsWithCentral =
endpointStr.endsWith(CENTRAL_ENDPOINT);
// No region was configured or the endpoint is central,
// determine the region from the endpoint.
if (region == null) {
region = getS3RegionFromEndpoint(endpointStr,
endpointEndsWithCentral);
if (region != null) {
origin = "endpoint";
if (endpointEndsWithCentral) {
builder.crossRegionAccessEnabled(true);
origin = "origin with cross region access";
LOG.debug("Enabling cross region access for endpoint {}",
endpointStr);
}
}
}
// No need to override endpoint with "s3.amazonaws.com".
// Let the client take care of endpoint resolution. Overriding
// the endpoint with "s3.amazonaws.com" causes 400 Bad Request
// errors for non-existent buckets and objects.
// ref: https://github.com/aws/aws-sdk-java-v2/issues/4846
if (!endpointEndsWithCentral) {
builder.endpointOverride(endpoint);
LOG.debug("Setting endpoint to {}", endpoint);
}
}
```
So now:
1) if endpoint = s3.amazonaws.com and region is null, set to US_EAST_2 and
enable cross region, and don't override endpoint.
2) if endpoint = s3.amazonaw.com and region is set (eg to eu-west-1), set
region but do not override endpoint..let SDK figure it out
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -146,7 +150,21 @@ public void testCentralEndpoint() throws Throwable {
describe("Create a client with the central endpoint");
Configuration conf = getConfiguration();
- S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, null, US_EAST_1,
false);
+ S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, null, US_EAST_2,
false);
+
+ expectInterceptorException(client);
+ }
+
+ @Test
+ public void testCentralEndpointWithRegion() throws Throwable {
+ describe("Create a client with the central endpoint but also specify
region");
+ Configuration conf = getConfiguration();
+
+ S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, US_WEST_2,
US_EAST_2, false);
Review Comment:
for example here, if configured region is US_WEST_2, expected region should
also be US_EAST_2
> AWS SDK V2 - Update S3A region logic
> -------------------------------------
>
> Key: HADOOP-19044
> URL: https://issues.apache.org/jira/browse/HADOOP-19044
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Ahmar Suhail
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set
> fs.s3a.endpoint to
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>
>
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in
> this case, region will be US_EAST_1), cross region access is not enabled.
> This will cause 400 errors if the bucket is not in US_EAST_1.
>
> Proposed: Updated the logic so that if the endpoint is the global
> s3.amazonaws.com , cross region access is enabled.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]