pitrou commented on code in PR #13633:
URL: https://github.com/apache/arrow/pull/13633#discussion_r934598841
##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -152,6 +152,10 @@ struct ARROW_EXPORT S3Options {
S3Options();
+ /// Get a stock AWS retry strategy from a string.
+ static std::shared_ptr<S3RetryStrategy> GetS3RetryStrategy(const
std::string& name,
Review Comment:
This should probably be a static method of `S3RetryStrategy` instead. Also,
the `name` parameter should be explained in the docstring.
##########
python/pyarrow/includes/libarrow_fs.pxd:
##########
@@ -164,6 +167,8 @@ cdef extern from "arrow/filesystem/api.h" namespace
"arrow::fs" nogil:
int load_frequency
CS3ProxyOptions proxy_options
CS3CredentialsKind credentials_kind
+ shared_ptr[CS3RetryStrategy] retry_strategy
+ int retry_max_attempts
Review Comment:
`int retry_max_attempts` doesn't seem to actually exist?
##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -152,6 +152,10 @@ struct ARROW_EXPORT S3Options {
S3Options();
+ /// Get a stock AWS retry strategy from a string.
+ static std::shared_ptr<S3RetryStrategy> GetS3RetryStrategy(const
std::string& name,
+ long
retry_attempts);
Review Comment:
Can you make this `int64_t max_attempts`? (i.e. use `int64_t` instead of
`long`, and name the argument better)
##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -152,6 +152,10 @@ struct ARROW_EXPORT S3Options {
S3Options();
+ /// Get a stock AWS retry strategy from a string.
+ static std::shared_ptr<S3RetryStrategy> GetS3RetryStrategy(const
std::string& name,
Review Comment:
Also, instead of passing a made-up name, I would favour two separate methods
`GetAwsDefaultRetryStrategy` and `GetAwsStandardRetryStrategy`.
##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -380,6 +411,23 @@ Result<S3Options> S3Options::FromUri(const std::string&
uri_string,
return FromUri(uri, out_path);
}
+std::shared_ptr<S3RetryStrategy> GetS3RetryStrategy(const std::string& name,
+ long retry_attempts) {
+ if (name == "aws_standard") {
+ auto aws_standard = std::make_shared<AwsRetryStrategy>(
+ std::make_shared<Aws::Client::StandardRetryStrategy>(retry_attempts));
+ return std::static_pointer_cast<S3RetryStrategy>(aws_standard);
Review Comment:
I don't think the `static_pointer_cast` is required since implicit
conversion should be available.
##########
python/pyarrow/_s3fs.pyx:
##########
@@ -157,11 +157,16 @@ cdef class S3FileSystem(FileSystem):
'port': 8020, 'username': 'username',
'password': 'password'})
allow_bucket_creation : bool, default False
- Whether to allow CreateDir at the bucket-level. This option may also
be
+ Whether to allow CreateDir at the bucket-level. This option may also be
passed in a URI query parameter.
allow_bucket_deletion : bool, default False
- Whether to allow DeleteDir at the bucket-level. This option may also
be
+ Whether to allow DeleteDir at the bucket-level. This option may also be
passed in a URI query parameter.
+ retry_strategy : str, default None
+ The name of the retry strategy to use with S3. Valid values are
+ "aws_standard" and "aws_default". Ignore with None.
+ retry_max_attempts : int, default 3
+ The maximum number of attempts to pass to AWS retry strategies.
Review Comment:
I think it is a bad design choice to use two separate arguments for this.
I would favour a single argument, perhaps together with more structured
objects, e.g.:
```python
class AwsDefaultRetryStrategy(object):
def __init__(self, max_attempts=3):
self.max_attempts = max_attempts
```
##########
python/pyarrow/_s3fs.pyx:
##########
@@ -157,11 +157,16 @@ cdef class S3FileSystem(FileSystem):
'port': 8020, 'username': 'username',
'password': 'password'})
allow_bucket_creation : bool, default False
- Whether to allow CreateDir at the bucket-level. This option may also
be
+ Whether to allow CreateDir at the bucket-level. This option may also be
passed in a URI query parameter.
allow_bucket_deletion : bool, default False
- Whether to allow DeleteDir at the bucket-level. This option may also
be
+ Whether to allow DeleteDir at the bucket-level. This option may also be
passed in a URI query parameter.
+ retry_strategy : str, default None
+ The name of the retry strategy to use with S3. Valid values are
+ "aws_standard" and "aws_default". Ignore with None.
Review Comment:
Talking about "default" here would introduce confusion with "aws_default"
(the AWS SDK's fault for poor naming, between "standard", "default" and
"legacy"...)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]