[
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810982#comment-17810982
]
ASF GitHub Bot commented on HADOOP-19044:
-----------------------------------------
virajjasani commented on PR #6479:
URL: https://github.com/apache/hadoop/pull/6479#issuecomment-1910667743
> > On the other hand, on any existing bucket from the other region (e.g.
us-west-2)
>
> do you mean for here you set the region to us-west-2 in
fs.s3a.endpoint.region for this?
Not really, i meant that with the above combination of endpoint
`s3.amazonaws.com` (and no region specified for `fs.s3a.endpoint.region`) and
this patch setting `us-east-2` with cross-region access enabled client
internally, is able to perform all operations on existing bucket from other
region. Only when bucket is not present, it gives 400 instead of 404. If it is
present, headBucket goes well. Similar case for object operations: if object is
not present and we do `fs#exists`, it fails with 400 instead of 404. If it
exists, headObject goes well.
Hence, tests that perform file system CRUD operations on real existing
bucket from other region, are passing without any issues **with this patch**
and settings:
1. `fs.s3a.endpoint` = `s3.amazonaws.com`
2. Nothing set for `fs.s3a.endpoint.region`, which would internally result
into `us-east-2` with cross region access (as per this patch).
**Without this patch**, file system CRUD operations fail on real existing
bucket from other region, which is expected.
e.g.
```
org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on
s3a://${bucket}/user/${user}/${dir-path}:
software.amazon.awssdk.services.s3.model.S3Exception: The authorization header
is malformed; the region 'us-east-2' is wrong; expecting 'us-west-2' (Service:
S3, Status Code: 400, Request ID: G85CNFC579T4MJ76, Extended Request ID:
xrYGGqXdYtr72cYyFN3v4yemDxBCYkdt8mYd8cGItNhdx1EmZMLxMhwJTwzmWZT6ershid/WT4w=):AuthorizationHeaderMalformed:
The authorization header is malformed; the region 'us-east-2' is wrong;
expecting 'us-west-2' (Service: S3, Status Code: 400, Request ID:
G85CNFC579T4MJ76, Extended Request ID:
xrYGGqXdYtr72cYyFN3v4yemDxBCYkdt8mYd8cGItNhdx1EmZMLxMhwJTwzmWZT6ershid/WT4w=)
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:259)
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:154)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4075)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3934)
at
org.apache.hadoop.fs.s3a.S3AFileSystem$MkdirOperationCallbacksImpl.probePathStatus(S3AFileSystem.java:3806)
at
org.apache.hadoop.fs.s3a.impl.MkdirOperation.probePathStatusOrNull(MkdirOperation.java:173)
at
org.apache.hadoop.fs.s3a.impl.MkdirOperation.getPathStatusExpectingDir(MkdirOperation.java:194)
at
org.apache.hadoop.fs.s3a.impl.MkdirOperation.execute(MkdirOperation.java:108)
at
org.apache.hadoop.fs.s3a.impl.MkdirOperation.execute(MkdirOperation.java:57)
at
org.apache.hadoop.fs.s3a.impl.ExecutingStoreOperation.apply(ExecutingStoreOperation.java:76)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2719)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2738)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:3778)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2494)
```
> AWS SDK V2 - Update S3A region logic
> -------------------------------------
>
> Key: HADOOP-19044
> URL: https://issues.apache.org/jira/browse/HADOOP-19044
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Ahmar Suhail
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set
> fs.s3a.endpoint to
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>
>
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in
> this case, region will be US_EAST_1), cross region access is not enabled.
> This will cause 400 errors if the bucket is not in US_EAST_1.
>
> Proposed: Updated the logic so that if the endpoint is the global
> s3.amazonaws.com , cross region access is enabled.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]