rpep opened a new issue, #49949:
URL: https://github.com/apache/arrow/issues/49949
### Describe the bug, including details regarding any error messages,
version, and platform.
If you have a bucket `s3://mybucket` which has prefixes which you are
allowed to write into via IAM roles, e.g. `s3://mybucket/allowed_dir`, you
cannot currently use `S3FileSystem::CreateDir(path, recursive=true)` because a
call of HeadBucket.
For e.g.:
```
(base) ray@ryan-test-head-wkzmw:~$ python
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:27)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> import pyarrow.fs as fs
>>> s3 = fs.S3FileSystem()
>>> s3.create_dir("test-bucket/pepperr/test", recursive=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/_fs.pyx", line 638, in pyarrow._fs.FileSystem.create_dir
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: When testing for existence of bucket 'test-bucket': AWS Error
ACCESS_DENIED during HeadBucket operation: No response body.
```
In this case, the bucket exists, but the user is not allowed to do the HEAD
operation on the root bucket itself. Nonetheless, the user is able to write/do
things in this location, so they should not be blocked by this.
```
>>> boto3.client("s3").put_object(Bucket="test-bucket",
Key="pepperr/test/probe.txt", Body=b"hello")
{'ResponseMetadata': {'RequestId': 'FXJ3YG658VKQ8725', 'HostId':
'37ezZQHJ0KzbBAQEU6uuDYWIIBa9XpNqs/E6neTpT/8XDpTIh7TE6hBXrHJDT+19STrC6QRcUKlp1zqeIMlLfCnvU/lPvtCp',
'HTTPStatusCode': 200, ....}
```
I think the check comes in here; if recursive=True, then you hit a code
block which checks if the bucket exists:
https://github.com/apache/arrow/blob/ebaaf07adbd302e95e393b5b77d78c1c97ea3b70/cpp/src/arrow/filesystem/s3fs.cc#L3162-L3180
Whereas this needs to actually be more permissive; the S3 call should be
issued rather than performing an explicit check for bucket existence, and
failure reported up the stack.
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]