AlJohri commented on code in PR #2455:
URL: https://github.com/apache/iceberg-rust/pull/2455#discussion_r3324841457
##########
crates/storage/opendal/src/lib.rs:
##########
@@ -326,9 +326,17 @@ impl OpenDalStorage {
}
};
- // Transient errors are common for object stores; however there's no
- // harm in retrying temporary failures for other storage backends as
well.
- let operator = operator.layer(RetryLayer::new());
+ // Apply observability/resilience layers. TimeoutLayer must be
+ // inside RetryLayer so each retry attempt is independently
+ // bounded — without a per-attempt timeout, a future parked on a
+ // silently dropped TCP connection never produces an `Err` and
+ // RetryLayer cannot retry, leaving the caller hung indefinitely.
+ // See:
https://opendal.apache.org/docs/rust/opendal/layers/struct.TimeoutLayer.html
+ //
+ // Transient errors are common for object stores; we retry temporary
+ // failures with exponential backoff. The retry behavior also
+ // benefits non-object-store backends.
+ let operator =
operator.layer(TimeoutLayer::new()).layer(RetryLayer::new());
Review Comment:
In this PR I haven't made it configurable at all and just used the OpenDAL
defaults but I can definitely update the PR to make it configurable if that's
desirable.
I was originally thinking it may be worth just adding the TimeoutLayer in
place with the existing defaults similar to the RetryLayer and doing
configurability changes in a follow up, but happy to do it now as well.
I didn't see any knobs for configuring the object level retries and timeouts
in the table behavior properties link, but looking at other implementations, I
could add:
- `s3.connect-timeout` ([matches
pyiceberg](https://github.com/apache/iceberg-python/blob/5da8186d0a77456e2da2a0286bc016af94c043ed/pyiceberg/io/__init__.py#L60)
and
[pyarrow](https://github.com/apache/iceberg-python/blob/5da8186d0a77456e2da2a0286bc016af94c043ed/pyiceberg/io/pyarrow.py#L454-L455))
- `s3.request-timeout` ([matches
pyiceberg](https://github.com/apache/iceberg-python/blob/5da8186d0a77456e2da2a0286bc016af94c043ed/pyiceberg/io/__init__.py#L61)
and
[pyarrow](https://github.com/apache/iceberg-python/blob/5da8186d0a77456e2da2a0286bc016af94c043ed/pyiceberg/io/pyarrow.py#L457-L458))
- `s3.retry.num-retries` ([matches
java](https://github.com/apache/iceberg/blob/0052942699cd7b5e098d54a958827a911d28ac94/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L475-L478))
- `s3.retry.min-wait-ms` ([matches
java](https://github.com/apache/iceberg/blob/0052942699cd7b5e098d54a958827a911d28ac94/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L480-L483))
- `s3.retry.max-wait-ms` ([matches
java](https://github.com/apache/iceberg/blob/0052942699cd7b5e098d54a958827a911d28ac94/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L485-L488))
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]