cboettig commented on PR #33918:
URL: https://github.com/apache/arrow/pull/33918#issuecomment-1407793898

   Okay, I see in the test suite the test now fails line: 
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-filesystem.R#L184
   
   This occurs because 
   
   ```r
   fs <- FileSystem$from_uri("s3://ursa-labs-r-test")
   ```
   
   extracts infers the region from the bucket* while `S3FileSystem$create()` 
obeys the region assignment configured (i.e. past as an argument, specified in 
the environmental variable, or configured in a `.aws` configuration file, then 
falling back on the default `us-east-1`)
   
   If desired, I could add two lines to auto-detect this region, which is 
returned by the header, i.e.
   
   ```
    curl -I https://s3.amazonaws.com/ursa-labs-r-test/
    ```
    
   Note that while AWS S3 API provides get-bucket-location, this requires 
permission is not required to get bucket location from the header. AWS's own 
docs say to use the HEAD method to detect region on v4 API anyway: 
https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadBucket.html
   
    
   It's not totally clear that such guessing of the region, rather than obeying 
the same conventions as S3FileSystem, is really the right choice though.  It 
may be worth a closer comparison to other interfaces to see how they handle 
this.  I believe there are valid use cases, e.g. where a bucket is mirrored 
across many regions, and a user wants to detect the region *in which the code 
is executing*, rather than guess the region from the bucket info (which iiuc, 
would result in a machine in US-west region, say, using data from the default 
us-east-1 region, rather than realizing that the bucket was mirrored in the 
local region).  
   
   I'm not an expert on AWS and handling of regions, but if the region was 
always intended to be referred only from the bucket name then AWS would have no 
reason to expose the configuration option in the first place.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to