HarshitGupta11 opened a new pull request, #6482: URL: https://github.com/apache/hadoop/pull/6482
If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set fs.s3a.endpoint to s3.amazonaws.com here:https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540 HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is set, or if a region can be parsed from fs.s3a.endpoint (which will happen in this case, region will be US_EAST_1), cross region access is not enabled. This will cause 400 errors if the bucket is not in US_EAST_1. Proposed: Updated the logic so that if the endpoint is the global s3.amazonaws.com , cross region access is enabled. ### Description of PR ### How was this patch tested? Its being currently tested against us-west-2 by explicitly setting the endpoint as s3.amazonaws.com ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
