jorisvandenbossche commented on code in PR #14599:
URL: https://github.com/apache/arrow/pull/14599#discussion_r1016340805
##########
docs/source/python/filesystems.rst:
##########
@@ -178,11 +178,19 @@ Example how you can read contents from a S3 bucket::
>>> f.readall()
b'some data'
+ # Note, to resolve the S3 bucket, one can use
`resolve_s3_region('my-test-bucket')`
Review Comment:
I would move this to an actual text paragraph instead of "hiding" it
somewhat in a comment in a longer example. And maybe first stating that the
S3FileSystem needs to be configured with the correct region for the bucket you
want to read. And then if you want to do that automatically instead of passing
the region manually (as in the example above), explain and show there are two
ways you can do that.
##########
python/pyarrow/_s3fs.pyx:
##########
@@ -174,7 +174,8 @@ cdef class S3FileSystem(FileSystem):
The frequency (in seconds) with which temporary credentials from an
assumed role session will be refreshed.
region : str, default 'us-east-1'
- AWS region to connect to.
+ AWS region to connect to. Use :func:`pyarrow.fs.resolve_s3_region` to
Review Comment:
The default nowadays is not always "us-east-1", but will also depend on some
configuration / env variables. See the expanded section in the C++ code about
this:
https://github.com/apache/arrow/blob/5889c78e344688f8fa8100ecdf254cd701ee3445/cpp/src/arrow/filesystem/s3fs.h#L105-L109
That is maybe too detailed, but wondering if we should update it here a bit
##########
docs/source/python/filesystems.rst:
##########
@@ -178,11 +178,19 @@ Example how you can read contents from a S3 bucket::
>>> f.readall()
b'some data'
+ # Note, to resolve the S3 bucket, one can use
`resolve_s3_region('my-test-bucket')`
+ >>> s3 = fs.S3FileSystem(region=resolve_s3_region('my-test-bucket'))
+
+ # Or alternatively...
+ >>> s3 =
fs.S3FileSystem.from_uri('s3://[access_key:secret_key@]bucket/path[?region=]')
Review Comment:
```suggestion
>>> s3, path =
fs.S3FileSystem.from_uri('s3://[access_key:secret_key@]bucket/path[?region=]')
```
`from_uri` returns two elements.
Also, maybe leave out the `[?region]` part, since the goal here is to show
that `from_uri` will infer the region from the bucket name (without the need to
manually specify it)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]