cboettig commented on PR #33918: URL: https://github.com/apache/arrow/pull/33918#issuecomment-1407793898
Okay, I see in the test suite the test now fails line: https://github.com/apache/arrow/blob/master/r/tests/testthat/test-filesystem.R#L184 This occurs because ```r fs <- FileSystem$from_uri("s3://ursa-labs-r-test") ``` extracts infers the region from the bucket* while `S3FileSystem$create()` obeys the region assignment configured (i.e. past as an argument, specified in the environmental variable, or configured in a `.aws` configuration file, then falling back on the default `us-east-1`) If desired, I could add two lines to auto-detect this region, which is returned by the header, i.e. ``` curl -I https://s3.amazonaws.com/ursa-labs-r-test/ ``` Note that while AWS S3 API provides get-bucket-location, this requires permission is not required to get bucket location from the header. AWS's own docs say to use the HEAD method to detect region on v4 API anyway: https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadBucket.html It's not totally clear that such guessing of the region, rather than obeying the same conventions as S3FileSystem, is really the right choice though. It may be worth a closer comparison to other interfaces to see how they handle this. I believe there are valid use cases, e.g. where a bucket is mirrored across many regions, and a user wants to detect the region *in which the code is executing*, rather than guess the region from the bucket info (which iiuc, would result in a machine in US-west region, say, using data from the default us-east-1 region, rather than realizing that the bucket was mirrored in the local region). I'm not an expert on AWS and handling of regions, but if the region was always intended to be referred only from the bucket name then AWS would have no reason to expose the configuration option in the first place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
